Hewlett Packard Enterprise Co. is acquiring Pachyderm Inc., a startup with a software platform designed to speed up artificial intelligence projects.
HPE announced the transaction on Thursday. It’s expected to close by the end of the month, after which HPE will integrate Pachyderm’s platform with its AI software portfolio. San Francisco-based Pachyderm previously raised $28.1 million from investors.
Enterprise software teams develop AI models with the help of training datasets. After a new neural network is built, it’s given the task of analyzing a training dataset until it learns to identify patterns of interest in the information. Once the neural network achieves a sufficiently high level of accuracy, it’s deployed in production to process live information.
The training datasets that engineers use to hone AI models’ accuracy often can’t be processed in their original form. Before deploying a training dataset, software teams have to filter any duplicate and erroneous records it may contain. The preparation process often also includes other tasks, such as turning the information into a form that can be processed using less hardware.
The process of preparing AI training datasets is performed with automated workflows known as data pipelines. Pachyderm offers a platform that makes it easier to build data pipelines. The platform can run on the major public cloud platforms, as well as companies’ on-premises infrastructure.
Pachyderm enables developers to write scripts that automate individual data preparation tasks such as duplicate record removal. Developers can then combine those scripts into a data pipeline. Pachyderm runs pipelines using the Kubernetes container orchestration engine, which enables it to automatically add or remove hardware resources according to an AI project’s requirements.
The startup says its platform can process upwards of terabytes of data per AI project. The platform is capable of ingesting structured information such as spreadsheets, as well as server logs and other types of files.
Pachyderm creates a record of the changes that data pipelines make to the information they ingest. By evaluating this record, engineers can identify potential technical issues in a pipeline. Pachyderm says that its platform also provides the ability to reproduce the results of past AI projects, which makes it easier to check their accuracy.
“As AI projects become larger and increasingly involve complex data sets, data scientists will need reproducible AI solutions to efficiently maximize their machine learning initiatives, optimize their infrastructure cost, and ensure data is reliable and safe no matter where they are in their AI journey,” said Justin Hotard, the executive vice president and general manager of HPE’s high-performance computing and AI division.
HPE plans to integrate Pachyderm with its Machine Learning Development System, a software platform for training AI models. The platform is based on technology that HPE obtained through an earlier startup acquisition.
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.