Apache Airflow 3.0 is a solution to the problem of batch data processing being too slow for real time AI.
Join our daily and weekly emails to receive the latest news and exclusive content about AI. It is difficult to move data from different sources into the correct location for AI. Data orchestration technologies such as Apache Airflow can help. This is the first major release in four years. Airflow has continued to be active in the 2.x series. The 2.9 and 2.10 releases were released in 2024 with a strong focus on AI. Apache Airflow is the most popular open-source platform for workflow orchestration. It has over 3,000 contributors, and it’s widely used by Fortune 500 companies. As organizations struggle to coordinate workflows across disparate platforms, clouds, and AI workloads, they have increasing needs. Apache Airflow 3.0 addresses critical enterprise needs with an architectural redesign that could improve how organizations build and deploy data applications.
“To me, Airflow 3 is a new beginning, it is a foundation for a much greater sets of capabilities,” Vikram Koka, Apache Airflow PMC (project management committee ) member and Chief Strategy Officer at Astronomer, told VentureBeat in an exclusive interview. “This is almost a complete refactor based on what enterprises told us they needed for the next level of mission-critical adoption.”
Enterprise data complexity has changed data orchestration needs
As businesses increasingly rely on data-driven decision-making, the complexity of data workflows has exploded. Organizations now manage intricate pipelines spanning multiple cloud environments, diverse data sources and increasingly sophisticated AI workloads.
Airflow 3.0 emerges as a solution specifically designed to meet these evolving enterprise needs. This release is a departure from previous versions. It introduces a distributed client-based model, which provides flexibility and increased security. This new architecture allows enterprises to:
Execute tasks across multiple cloud environments.
Implement granular security controls.
Support diverse programming languages.
Enable true multi-cloud deployments.
Airflow 3.0’s expanded language support is also interesting. The previous version was primarily Python-centric. However, the new release supports multiple programming language natively.
- Airflow is expected to support Python, Go and Java with TypeScript, Rust and TypeScript planned. This approach means data engineers can write tasks in their preferred programming language, reducing friction in workflow development and integration.
- Event-driven capabilities transform data workflows
- Airflow has traditionally excelled at scheduled batch processing, but enterprises increasingly need real-time data processing capabilities. Airflow 3.0 now supports that need.
- “A key change in Airflow 3 is what we call event-driven scheduling,” Koka explained.
Instead of running a data processing job every hour, Airflow now automatically starts the job when a specific data file is uploaded or when a particular message appears. This could include data loaded into an Amazon S3 cloud storage bucket or a streaming data message in Apache Kafka.
The event-driven scheduling capability addresses a critical gap between traditional ETL
tools and stream processing frameworks like Apache Flink or Apache Spark Structured Streaming, allowing organizations to use a single orchestration layer for both scheduled and event-triggered workflows.
Airflow will accelerate enterprise AI inference execution and compound AI
The event-driven data orchestration will also help Airflow to support rapid inference execution.
As an example, Koka detailed a use case where real-time inference is used for professional services like legal time tracking. Airflow will accelerate enterprise AI inference execution and compound AI
The event-driven data orchestration will also help Airflow to support rapid inference execution. Koka provided an example where real-time inference is used for professional services like legal time tracking. Large language models (LLMs) can be used for the transformation of unstructured data into structured data. A pre-trained AI model can be used to analyze structured time tracking data to determine if work is billable and then assign the appropriate billing codes. This type of multi-step, real-time inference process is possible with Airflow 3.0 due to its event-driven architecture. [Extract, Transform and Load]Compound AI was defined in 2024 by the Berkeley Artificial Intelligence Research Center and is different from agentic AI. Koka explained that agentic AI allows for autonomous AI decision making, whereas compound AI has predefined workflows that are more predictable and reliable for business use cases.
Playing ball with Airflow, how the Texas Rangers look to benefit
Among the many users of Airflow is the Texas Rangers major league baseball team.
Oliver Dykstra, full-stack data engineer at the Texas Rangers Baseball Club, told VentureBeat that the team uses Airflow hosted on Astronomer’s Astro platform as the ‘nerve center’ of baseball data operations. Airflow is used to manage all aspects of player development, contracts and analytics, as well as game data. Dykstra said that they were looking forward to upgrading Airflow 3 to its enhanced event-driven scheduling capabilities, data lineage and observability. “As we already rely on Airflow to manage our critical AI/ML pipelines, the added efficiency and reliability of Airflow 3 will help increase trust and resiliency of these data products within our entire organization.”
What this means for enterprise AI adoption
For technical decision-makers evaluating data orchestration strategy, Airflow 3.0 delivers actionable benefits that can be implemented in phases.
The first step is evaluating current data workflows that would benefit from the new event-driven capabilities. Organizations can identify the data pipelines which trigger scheduled jobs but could manage them more efficiently with event-based triggers. This shift can significantly reduce processing latency while eliminating wasteful polling operations.
Next, technology leaders should assess their development environments to determine if Airflow’s new language support could consolidate fragmented orchestration tools. Airflow 3.0 is a key infrastructure component for enterprises that are leading the AI implementation. It can help them overcome a major challenge: orchestrating multi-stage AI workflows on a large scale. The platform’s ability to coordinate compound AI systems could help enable organizations to move beyond proof-of-concept to enterprise-wide AI deployment with proper governance, security and reliability.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. Read our privacy policy
If you want to impress your boss, VB Daily has you covered.
Thank you for subscribing. Click here to view more VB Newsletters.
An error occured.