Skip to main content

The Department of Government Efficiency, also known as DOGE, has obtained unprecedented access to at least seven sensitive federal databases, including those of the Internal Revenue Service and the Social Security Administration. This access has sparked concerns about cybersecurity vulnerabilities and privacy violations. However, another concern has received relatively little attention: the potential use of this data to train a private company’s artificial intelligence systems.

The White House press secretary has stated that the government data collected by DOGE is not being used to train Elon Musk’s AI models, despite Musk’s control over DOGE. Nevertheless, evidence has emerged that DOGE personnel hold simultaneous positions with at least one of Musk’s companies, creating a potential conduit for federal data to be siphoned to Musk-owned enterprises, including xAI. The company’s latest Grok AI chatbot model refuses to provide a clear denial about using such data.

As a political scientist and technologist familiar with public sources of government data, I believe that the potential transmission of government data to private companies presents significant privacy and power implications that have not been fully addressed. A private entity with the capacity to develop artificial intelligence technologies could use government data to gain a substantial advantage over its competitors and wield considerable influence over society.

Value of government data for AI

For AI developers, government databases are akin to finding the Holy Grail. While companies like OpenAI, Google, and xAI currently rely on information scraped from the public internet, nonpublic government repositories offer something far more valuable: verified records of actual human behavior across entire populations.

This is not merely more data – it is fundamentally different data. Social media posts and web browsing histories show curated or intended behaviors, but government databases capture real decisions and their consequences. For instance, Medicare records reveal health care choices and outcomes, while IRS and Treasury data reveal financial decisions and long-term impacts.

What makes this data particularly valuable for AI training is its longitudinal nature and reliability. Unlike the disordered information available online, government records follow standardized protocols, undergo regular audits, and must meet legal requirements for accuracy. Every Social Security payment, Medicare claim, and federal grant creates a verified data point about real-world behavior.

Most critically, government databases track entire populations over time, not just digitally active users. They include people who never use social media, don’t shop online, or actively avoid digital services. For an AI company, this would mean training systems on the actual diversity of human experience rather than just digital reflections.

The technical advantage

Current AI systems face fundamental limitations that no amount of data scraped from the internet can overcome. When ChatGPT or Google’s Gemini make mistakes, it’s often because they’ve been trained on information that might be popular but isn’t necessarily true. They can tell you what people say about a policy’s effects, but they can’t track those effects across populations and years.

Government data could change this equation. Imagine training an AI system not just on opinions about health care but on actual treatment outcomes across millions of patients. Consider the difference between learning from social media discussions about economic policies and analyzing their real impacts across different communities and demographics over decades.

A large, state-of-the-art model trained on comprehensive government data could understand the actual relationships between policies and outcomes. It could track unintended consequences across different population segments, model complex societal systems with real-world validation, and predict the impacts of proposed changes based on historical evidence.

Control of critical systems

A company like xAI could do far more with models trained on government data than building better chatbots or content generators. Such systems could fundamentally transform – and potentially control – how people understand and manage complex societal systems. While some of these capabilities could be beneficial under the control of accountable public agencies, I believe they pose a threat in the hands of a single private company.

Medicare and Medicaid databases contain records of treatments, outcomes, and costs across diverse populations over decades. A frontier model trained on new government data could identify treatment patterns that succeed where others fail, and dominate the health care industry.

Treasury data represents perhaps the most valuable prize. Government financial databases contain granular details about how money flows through the economy. An AI company with access to this data could develop extraordinary capabilities for economic forecasting and market prediction.

Elon Musk’s xAI company is well financed.

Infrastructure and urban systems

Government databases contain information about critical infrastructure usage patterns, maintenance histories, emergency response times, and development impacts. Every federal grant, infrastructure inspection, and emergency response creates a data point that could help train AI to better understand how cities and regions function.

The power lies in the potential interconnectedness of this data. An AI system trained on government infrastructure records would understand how transportation patterns affect energy use, how housing policies affect emergency response times, and how infrastructure investments influence economic development across regions.

Absolute data corrupts absolutely

A company such as xAI, with Musk’s resources and preferential access through DOGE, could surmount technical and political obstacles far more easily than competitors. The threat of a private company accessing government data transcends individual privacy concerns. Even with personal identifiers removed, an AI system that analyzes patterns across millions of government records could enable surprising capabilities for making predictions and influencing behavior at the population level.

Since information is power, concentrating unprecedented data in the hands of a private entity with an explicit political agenda represents a profound challenge to the republic. I believe that the question is whether the American people can stand up to the potentially democracy-shattering corruption such a concentration would enable.

Allison Stanger, Distinguished Endowed Professor, Middlebury

This article is republished from The Conversation under a Creative Commons license. Read the original article.


Source Link