The Unit Economics of Data Chaos
Working on GiveSpark, I’ve gone through all five stages of grief—denial, anger, bargaining, depression, and acceptance. Every time I fix one issue with our data sources, ten more appear that make no sense. IRS forms are simultaneously strict and structured, yet an absolute mess. I’ve dealt with chaotic data before, but I never expected one of the main government entities to be this disorganized.
This frustration isn’t new for me. It’s a pattern I keep encountering, and I’m starting to think it reveals something fundamental about our industry.
We Keep Building the Same Thing
Engineers thrive on structure. We build data structures, organize them, transform them, share them. And business people desperately want structure too—they demand CRMs, ERPs, project management tools, anything to run their operations efficiently.
But reality is entropic. The world resists our schemas. Government bureaucracies layer decades of legacy decisions into forms that contradict themselves. Businesses evolve faster than their systems. Edge cases multiply. Every attempt to impose order creates new pockets of chaos at the boundaries.
Here’s the uncomfortable question: don’t most companies essentially use the same nomenclature with minor variations? Every service company has the same core data—customers, service types, reports, vendors. The specifics differ, but the bones are identical.
The industry knows this is a problem. In fact, the biggest data company in the world was built on this exact realization. Recently, I stumbled across this video about how Palantir Foundry works. It’s not new, but it crystallized something I’ve been circling for years. Foundry is incredibly sophisticated, but at its core, it does three things:
- Collects raw data from different sources
- Merges data into clean, unified datasets
- Uses those datasets to power dashboards and actions
The cherry on top? One of Palantir’s key selling points is that they’ve pre-built collectors and mergers for systems like SAP and Oracle. They’ve industrialized the integration problem.
So why are we constantly rebuilding the same infrastructure from scratch? Why does every company treat their data architecture like a unique snowflake when the underlying patterns are nearly universal?
I’ve Been Here Before
This isn’t the first time I’ve had this realization.
In 2019-2020, I was working on ComCard, a fintech startup in corporate payments. We were building online banking with customizable features—not generic customization, but industry-specific configurations. For construction companies, we wanted tighter connections to projects and materials. For interior designers, better integration with furniture suppliers and design tools. We planned to build these vertical configurations for every industry.
Then COVID hit. Our bet on charge cards for construction burned in front of our eyes. We finished our beta in early March 2020—perfect timing to watch our market evaporate.
While trying to save the company, we pivoted to a bigger idea: banking OS. Online banking software for financial institutions with the ability to install apps that extend functionality. A platform, not just a product.
Technically, it was the right move. Economically, we had jumped from a hard distribution problem to an impossible one.
The Real Problem Isn’t Software
The average sales cycle for core banking software is measured in years, not months. And being a new company with no track record, no bank was going to trust us with a critical piece of their infrastructure. We didn’t even seriously attempt to sell it—the math didn’t work. Our runway couldn’t survive their procurement timeline.
We shut down operations in August 2020 and dissolved the company by year’s end.
Here’s what I learned: we didn’t fail because the software was wrong. We failed because there was no path to revenue that fit startup economics.
And this connects directly back to those IRS forms I’m wrestling with now.
The reason serving small businesses is so expensive isn’t the complexity of the software—it’s the complexity of the data. Messy, inconsistent data requires human engineers to untangle. You can’t automate the edge cases when every client’s data has different edges. That human cost doesn’t scale down. Serving a 20-person company costs nearly as much as serving a 2,000-person company, but the revenue is 100x smaller.
If data were clean and standardized, the software could be $50/month. But data is never clean. The messiness creates the cost; the cost kills the distribution.
This is Palantir’s real moat. It’s not their ontology engine or their data fusion capabilities. It’s their government relationships, their enterprise sales machine, their ability to land $10M+ contracts that justify the human cost of implementation. They survived their early years because they had Peter Thiel’s backing and government contracts with long timelines but guaranteed revenue. They could afford to throw engineers at messy data because the contracts covered it.
A startup trying to build “Foundry for SMBs” faces the inverse problem. Small businesses churn constantly, have tiny budgets, and need significant hand-holding. The unit economics are brutal.
The graveyard of “Palantir for the rest of us” is filled with technically sound products that couldn’t acquire customers profitably.
Does AI Change the Math?
We have powerful tools already—Databricks, dbt, Snowflake, Apache Flink, and Spark. But nothing that people without technical knowledge can actually use. A small local business can’t spend seven figures on Foundry to optimize its five-person operation. But to grow, they need real visibility into their business, not a hundred disconnected Google Sheets.
Here’s where LLMs might actually matter—not as dashboard builders or chatbot interfaces, but as entropy reducers.
The expensive part of serving SMBs has always been the human engineering required to wrangle messy data into structure. AI is genuinely good at this specific task: taking inconsistent text, contradictory formats, and edge-case chaos and normalizing it into clean JSON. The thing that used to require an engineer staring at IRS forms for hours can now happen programmatically.
If AI can dramatically reduce the cost of data normalization, it changes the unit economics. Suddenly, serving a 20-person company doesn’t require the same human overhead as serving an enterprise. The margin structure that made “Palantir for SMBs” impossible might finally become viable.
Or maybe the distribution problem remains unsolved regardless of how good the technology gets. Maybe Palantir’s moat isn’t the cost of implementation at all—it’s the trust required to hand over your data, the switching costs once you’re embedded, the enterprise relationships that take decades to build.
I don’t have the answer. But after a decade of watching this pattern—building data infrastructure at scale, attempting platforms that die on distribution, and now wrestling with government data chaos—I keep coming back to the same question:
Why do we keep rebuilding the same things, and what would it actually take to stop?