VAST Data builds its ‘transformative data platform’ for deep-learning AI

VAST DATA – a start-up company rapidly building its reputation as ‘the data platform company for the AI era’– has unveiled its transformative data computing platform designed to be the foundation of artificial intelligence-assisted discovery. Some of the world's biggest data-driven companies – including Booking.com, NASA, Pixar Animation Studios, Zoom Video Communications – are already on board and leading Australian companies are watching closely.

The VAST Data Platform is VAST’s global data infrastructure that is unifying storage, database and virtualised compute engine services in a scalable system that was built from the ground up for the future of AI. It marks out a clear pathway for the full vision of the company in working with AI to enable and manage that journey successfully for large enterprises globally.

The platform will, for the first time, combine key elements AI apps need – unified storage, database, and an AI computing engine – in one system.

It will provide an engine for AI research, applications and more happening in Australia far beyond popularised generative AI. It comes as CSIRO predicts AI could be worth upwards of $315 billion to the Australian economy by 2028.

Graduating beyond Large Language Models to AI-assisted discovery

VAST Data’s research has determined that while generative AI and Large Language Models (LLMs) have introduced the world to the early capabilities of artificial intelligence, LLMs are limited to performing routine tasks like business reporting or reciting information that is already known.

The true promise of AI will be realised when machines can recreate the process of discovery by capturing, synthesising and learning from data – achieving a level of specialisation that used to take decades in a matter of days.

The era of AI-driven discovery will accelerate humanity’s quest to solve its biggest challenges, according to a VAST Data report.

AI can “help industries find treatments for disease and cancers, forge new paths to tackle climate change, pioneer revolutionary approaches to agriculture, and uncover new fields of science and mathematics that the world has not yet even considered” according to VAST Data.

As such, enterprises are increasingly turning their focus to AI applications, and while organisations can stitch together technologies from disparate public or private cloud offerings, customers require a data platform that simplifies the data management and processing experience into one unified stack.

According to VAST Data, “Today’s existing data platforms have become popular for global enterprises, dramatically reducing infrastructure deployment complexity for business intelligence and reporting applications, but are not built to meet the needs of new deep learning applications.

“This next generation of AI infrastructure must deliver parallel file access, (graphics processing unit) GPU-optimised performance for neural network training and inference on unstructured data, and a global namespace spanning hybrid multi-cloud and edge environments, all unified within one easy to manage offering in order to enable federated deep learning.” 

Introducing the VAST Data Platform

The foundation of this next era of AI computing can only be built by resolving fundamental infrastructure trade-offs that have previously limited applications from computing on and understanding datasets from global infrastructure in real-time. To bring deep learning to data, VAST Data is introducing the VAST Data Platform.

The VAST Data Platform was built with the entire data spectrum of natural data in mind – a unstructured and structured data types in the form of video, imagery, free text, data streams and instrument data – generated from all over the world and processed against an entire global data corpus in real-time.

This approach aims to close the gap between event-driven and data-driven architectures by providing the ability to:

  • Access and process data in any private or major public cloud data centre;
  • Understand natural data by embedding a queryable semantic layer into the data itself;
  • Continuously and recursively compute data in real time, evolving with each interaction.

For more than seven years, VAST has been building toward a vision that puts data – natural data, rich metadata, functions and triggers – at the centre of the VAST Disaggregated Shared-Everything (DASE) distributed systems architecture.

DASE lays the data foundation for deep learning by eliminating trade-offs of performance, capacity, scale, simplicity and resilience to make it possible to train models on all of an enterprise’s data. By allowing customers to now add logic to the system, machines can continuously and recursively enrich and understand data from the natural world.

A unified global DataStore, database and AI computing engine

To capture and serve data from the natural world, VAST first engineered the foundation of its platform, the VAST DataStore as a scalable storage architecture for unstructured data that eliminates storage tiering.

Exposing enterprise file storage and object storage interfaces, the VAST DataStore is an enterprise network attached storage platform built to meet the needs of today’spowerful AI computing architectures, such as NVIDIA DGX SuperPOD AI supercomputers, as well as big-data and high-performance computing (HPC) platforms.

The exabyte-scale DataStore is built with best-in-class system efficiency to bring archive economics to flash infrastructure. This makes it also suitable for archive applications. Resolving the cost of flash storage has been critical to laying the foundation for deep learning for enterprise customers as they look to train models on their proprietary data assets.

To date, VAST has managed more than 10 exabytes of data globally with leading customers including Booking.com, NASA, Pixar Animation Studios, Zoom Video Communications, Inc, and others.

To apply structure to unstructured natural data, VAST has added a semantic database layer natively into the system with the introduction of the VAST DataBase. Applying first-principles simplification of structured data – by combining the characteristics of a database, a data warehouse and a data lake all in one simple, distributed and unified database management system – VAST has resolved the trade-offs between transactions (to capture and catalogue natural data in real time) and analytics (to analyse and correlate data in real-time).

Designed for rapid data capture and fast queries at any scale, the VAST DataBase is the first system to break the barriers of real-time analytics from the event stream all the way to the archive.

With a foundation for synthesised structured and unstructured data, the VAST Data Platform then makes it possible to refine and enrich raw unstructured data into structured, queryable information with the addition of support for functions and triggers.

The VAST DataEngine is a global function execution engine that consolidates data centres and cloud regions into one global computational framework. The engine supports popular programming languages, such as SQL and Python, and introduces an event notification system as well as materialised and reproducible model training that make it easier to manage AI pipelines.

VAST DataSpace provides a ‘safe’ integrated space

The final element of the VAST Data Platform strategy is the VAST DataSpace, a global namespace that permits every location to store, retrieve and process data from any location with high performance while enforcing strict consistency across every access point.

With the DataSpace, the VAST Data Platform is deployable in on-premises data centres, edge environments and now also extends DataSpace access into leading public cloud platforms including AWS, Microsoft Azure and Google Cloud.

This global, data-defined computing platform takes a new approach to marrying unstructured data with structured data by storing, processing and distributing that data from a single, unified system.

For enterprise AI and LLM systems to drive new discoveries and understandings, they require:

  • Direct access to the natural world through the VAST DataSpace, eliminating reliance on slow and inaccurate translations;
  • The ability to store immense amounts of natural unstructured data in an accessible manner, through the VAST DataStore;
  • The intelligence to transform unstructured raw data into an understanding of its underlying characteristics, through the VAST DataEngine;
  • And finally, a way to build on all of an organisation’s global knowledge, query it, and generate a better understanding of it, through the VAST DataBase.

“We’ve been working toward this moment since our first days, and we’re incredibly excited to unveil the world’s first data platform built from the ground up for the next generation of AI-driven discovery,” CEO and co-founder at VAST Data, Renen Hallak said.

“Encapsulating the ability to create and catalogue understanding from natural data on a global scale, we’re consolidating entire IT infrastructure categories to enable the next era of large-scale data computation. With the VAST Data Platform, we are democratising AI abilities and enabling organisations to unlock the true value of their data.”

The VAST DataStore, DataBase and DataSpace are generally available within the VAST Data Platform today, and the VAST DataEngine will be made available in 2024.

The approach and results VAST Data has demonstrated and generated have become a talking point in scientific, academic and business circles alike.

Max Tegmark, Massachusetts Institute of Technology

“To be really impactful in this era of AI and deep learning, you want not only to have lots of data, but high quality data that is correctly organised and available at the right place at the right time,” said MIT professor and AI researcher Max Tegmark.

“As long as we manage its potential risk, AI will bring an immense upside, helping us solve many of the problems that have stumped humanity so far, from curing diseases to eliminating poverty and stabilising our climate. It’s incredibly inspiring, so let's not squander the amazing opportunities that this era of AI-enabled possibilities offers.” 

Watch Professor Max Tegmark’s extended video comment here

Eric Bermender, Pixar Animation Studios, California

“VAST is allowing us to put all of our rendered assets on one tierless cluster of storage, which offers us the ability to use these petabytes of data as training data for future AI applications,” said Eric Bermender, head of data centre and IT infrastructure at Pixar Animation Studios.

“We’ve already moved all of our denoising data, ‘finals’ and ‘takes’ data sets onto the VAST Data Platform, specifically because of the AI capabilities this allows us to take advantage of in the future.” 

Watch Eric Bermendar’s extended video comment here

Vijay Parthasarathy, Zoom Video Communications, California

“AI is a big priority for us here at Zoom, and we’re working with VAST on efficiently building and training our AI/ML models across multiple unstructured datasets of video, audio and text data,” said Vijay Parthasarathy, head of AI/ML at Zoom.

“Automation is the key, and the VAST Data Platform allows us to build beyond the capabilities that we’ve already built to deliver a frictionless global communication experience.” 

Watch Vijay Parthhasarathy’s extended video comment here

Manuvir Das, NVIDIA Corporation, California

“As data is the fuel for AI, enterprises need modern data architectures to position themselves for success amid the greatest technology shift of our time,” said Manuvir Das, vice president for Enterprise Computing at NVIDIA.

“VAST’s new platform provides powerful integration with NVIDIA DGX AI supercomputing to provide companies with a comprehensive solution for transforming their data into powerful generative AI applications.” 

Watch Manuvir Das’s extended video comment here

Ritu Jyoti, International Data Corporation (IDC), Massachusetts

 “According to IDC Worldwide AI Spending Guide, Feb (2023 V1), global spending on AI-centric systems continues to grow at double digit rates, reaching a five-year (2021-2026) CAGR (compound annual growth rate) of 27 percent and will exceed $308 billion by 2026,” said Ritu Jyoti, group vice president, AI and Automation Research Practice at IDC.

“Data is foundational to AI systems, and the success of AI systems depends crucially on the quality of the data, not just their size.

“With a novel systems architecture that spans a multi-cloud infrastructure, VAST is laying the foundation for machines to collect, process and collaborate on data at a global scale in a unified computing environment – and opening the door to AI-automated discovery that can solve some of humanity's most complex challenges.”

###

Launched in 2019, VAST is regarded as the fastest-growing data infrastructure startup in history. VAST Data is the data platform software company set up specifically to lead business into the AI era.

Accelerating time-to-insight for workload-intensive applications, the VAST data platform delivers scalable performance, radically simple data management and enhanced productivity.

For more information on VAST’s vision for the future of AI, including keynote presentations from VAST executives and industry influencers, and video testimonials from customers and partners, visitBuildBeyond.ai.

https://vastdata.com

 

ends

Contact Us

 

PO Box 2144
MANSFIELD QLD 4122