May 25, 2024
Weka scores $140M to supercharge AI workloads with ‘dynamic data pipelines’


Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.


Data is the lifeblood of modern AI systems, but making it available for these complex workloads is quite a task. Today, Weka, a startup working to simplify this challenge by making data continuously available on-demand with a unique AI-native offering, announced it has raised a series E round of $140 million on the back of significant customer interest. The investment, coming entirely from existing investors, has taken the company’s valuation to $1.6 billion – double what it was back in November 2022.

“Few could have predicted just how quickly the AI market would take off… But when [generative AI] exploded on the scene in December 2022, global demand for Weka’s data platform software skyrocketed, with large enterprise customers and research organizations looking to accelerate their AI initiatives,” Jonathan Martin, president at Weka, told VentureBeat. 

The company plans to use the fresh capital across multiple areas, most notably to further enhance its platform — a software-based solution that eliminates data bottlenecks stemming from legacy architectures and creates a “dynamic pipeline”  that feeds continuous data to GPUs and AI workloads, thereby increasing their efficiency and sustainability.

What does Weka bring to the table?

Even as enterprise leaders continue to reiterate commitment to modern workloads like generative AI, downstream teams are struggling to bring those projects to life owing to data silos and gaps stemming from legacy architectures. A typical generative AI pipeline revolves around multiple steps of copying datasets, which creates bottlenecks that slow down the training processes and burn more energy.

VB Event

The AI Impact Tour: The AI Audit

Join us as we return to NYC on June 5th to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Request an invite

Founded in 2013, Weka is solving this problem with what it calls a “dynamic data pipeline.” Essentially, the company’s software-based data platform leverages a unique zero-copy architecture that eliminates time-intensive copying and accelerates each step of the AI pipeline to keep GPUs constantly fed with data. This ultimately allows models to be trained faster and more efficiently — leading to faster time for insights and better business outcomes.

“By allowing organizations to simplify their IT stack for demanding AI and GPU-intensive pipelines, providing significant savings and speeds, Weka customers can make it to market quicker than their competitors with a lower spend. Our significant performance improvements also lead to substantial savings in the power required to run the GPU servers, making Weka the most sustainable way to implement large AI projects,” Martin explained.

At the heart of the Weka Data platform is a scale-out, shared parallel file system called WekaFS. It interfaces directly with Peripheral Component Interconnect Express (PCIe)-connected Non-Volatile Memory Express (NVMe) drives. It handles a wide variety of data types and sizes and IO profiles – and delivers 10x the performance of legacy network attached storage (NAS) systems and 3x the performance of local server storage.

“The Weka Data Platform is designed for customers with complex data challenges and demanding data environments, including large enterprises, cloud service providers, research institutions, media companies, AI/ML companies and startups, IoT applications and financial services firms that are running performance-intensive next-generation workloads like AI, ML, HPC, quantum computing, 16K media and VFX,” Martin explained. 

On the sustainability front, the president claims the platform’s ability to improve GPU utilization with dynamic pipelines allows customers to save 260 tons of CO2e (Carbon dioxide equivalent) per petabyte of data stored.

Significant growth and the road ahead

While the gen AI wave triggered after the rise of ChatGPT nearly two years ago, Weka has been preparing for this age since its inception. As a result, rather than evolving its product to meet the market needs (much like other players), the company is aggressively selling to the customers coming to its doorstep.

“We were already focused on modernizing the enterprise data stack by architecting a solution that could support the speed, scale, simplicity and sustainability requirements of modern, performance-intensive workloads like AI/ML. We were not only prepared for the shift but ahead of the curve,” Martin noted.

Currently, the company has more than 300 customers, including 12 of the Fortune 50. Some notable AI companies using Weka’s platform are Stability AI, Midjourney, ElevenLabs, The Center for AI Safety, and AI service providers/GPU clouds like Iris Energy (IREN), Applied Digital, NexGen Cloud and Yotta. On the financial side, the annual recurring revenue from the company’s software subscription model has doubled year-over-year and now exceeds $100 million. Martin projects it will triple or even quadruple in this fiscal year.

With this funding, Weka will augment its cash reserves from previous rounds and work towards scaling the business to meet the demand for AI infrastructure in the wake of the generative AI boom. This, Martin said, includes investing in R&D, making enhancements to the data platform and investing in customer success initiatives. 

The company is expecting to expand its global workforce of 400 by at least 25% by the end of this fiscal year. Other notable players competing with Weka in the distributed file system space are VAST Data, Nutanix, IBM, Dell Technologies, Qumulo and Pure Storage.



Source link