The following article covers Matei Zaharia, co-founder and CTO of Databricks, receiving the 2026 ACM Prize in Computing for his PhD-era work on Spark, and what his perspectives mean for the future of data platforms, AI research, and scientific discovery. It ties his award to the broader trajectory of cloud data infrastructure, open-source innovation, and the responsible use of artificial intelligence in science.
Spark’s Legacy: The data revolution sparked by an open-source breakthrough
At the center of this story sits Spark, the lightning-fast, open-source data-processing engine Zaharia started building during his PhD at UC Berkeley. Spark changed big-data analytics by enabling scalable in-memory processing, which sped up analytics and opened new doors for data-driven research and industry.
The project’s influence stretches from academic labs to the cloud. Databricks, built on Spark’s core ideas, has grown into a major data and AI platform.
Researchers and engineers still lean on Spark to organize, process, and pull insights from huge datasets. The open-source approach behind Spark also fueled the rapid rise of cloud-native data platforms, kicking off a new era of collaborative data engineering and AI-powered analytics.
ACM Prize in Computing: A PhD-era breakthrough recognized
The 2026 ACM Prize in Computing recognizes Zaharia for his foundational Spark work. This honor highlights how one research breakthrough can ripple through both academia and industry.
The prize comes with a $250,000 cash award. Zaharia plans to donate it to charity, showing his commitment to broader societal impact beyond just his company or research.
He also remains an associate professor at UC Berkeley, still pushing forward research at the crossroads of data and AI.
From Spark to Databricks: Building a cloud data foundation
Zaharia’s journey from Spark to co-founding Databricks shows how academic ideas can turn into a leading cloud and data company. Databricks has raised over $20 billion and is valued at about $134 billion.
Right now, their revenue run rate sits near $5.4 billion. That kind of scale points to a bigger shift in data infrastructure: bringing together data engineering, analytics, and AI tools into unified platforms so researchers and businesses can work with data faster than ever.
The company’s ecosystem has helped launch a new generation of data scientists and engineers. They use collaborative notebooks, managed clusters, and smooth data pipelines to speed up discovery and product development.
Zaharia’s leadership at Databricks really bridges the gap between pioneering research and scalable, real-world data operations.
Leadership, academia, and ongoing research
Even as CTO and a key entrepreneur, Zaharia keeps strong ties to academia. His ongoing role as an associate professor at UC Berkeley shows he’s still committed to advancing both theory and practical engineering at the data-AI frontier.
He often says the most exciting opportunities are in tools that automate and speed up research across fields—from biology to engineering—while sticking to high standards for reliability and reproducibility.
AI and the future of scientific discovery
For Zaharia, how we measure AI really matters. He argues that AGI-like capabilities already exist in some systems, but we often make the mistake of judging models only by human standards.
This, he says, brings risk: treating AI as if it thinks like a human can hide security hazards and make us too reliant on automation. He’s pointed to real-world concerns, like the OpenClaw agent—a warning sign of how agents with sensitive access could act on their own in ways we don’t expect.
Yet, Zaharia remains optimistic about what AI can do for research. He imagines AI tools that cut down on hallucinations, organize complex info, and make tough concepts easier to grasp.
The goal isn’t to copy human thinking, but to boost our abilities—especially in raw data processing, multimodal sensing, and quick hypothesis testing. That way, scientists and engineers can spend more time on creative problem-solving and discovery.
Practical implications for scientists and engineers
For researchers, labs, and tech teams, Zaharia’s perspective points to some pretty clear directions for adopting AI in science and engineering. Here are a few key takeaways worth keeping in mind:
- Use AI to automate repetitive data curation, experiment planning, and result synthesis. That frees researchers to focus on new, interesting questions.
- Design AI tools that highlight uncertainty, hallucination risks, and where the data came from. Reliability matters more than ever.
- Set up strong security controls and make sure everything’s auditable when using autonomous agents or AI-assisted workflows. That way, you reduce the risk of unintended actions.
- Encourage collaboration between academia and industry. It’s the best way to make sure AI tools stay rigorous and reproducible.
- Support philanthropic and societal goals tied to AI-enabled discovery. Zaharia’s charitable approach to the ACM Prize sets a great example.
Spark’s influence is still shaping how we store, process, and make sense of data. It’s hard not to notice how it keeps guiding scientists toward faster, more reliable discoveries in this AI-augmented era.
Here is the source article for this story: Databricks co-founder wins prestigious ACM award, says ‘AGI is here already’