SageMaker aims to provide a machine learning service that can be used to build, train, and deploy ML models for virtually any use case.
Amazon Sagemaker Pipelines
Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker. SageMaker aims to provide a machine learning service that can be used to build, train, and deploy ML models for virtually any use case. During this year’s re Invent conference, AWS made several announcements to further improve SageMaker’s capabilities.
Swami Sivasubramanian, VP of Amazon Machine Learning at AWS, said:
Hundreds of thousands of everyday developers and data scientists have used our industry-leading machine learning service, Amazon SageMaker, to remove barriers to building, training, and deploying custom machine learning models. One of the best parts about having such a widely-adopted service like SageMaker is that we get lots of customer suggestions which fuel our next set of deliverables.
Today, we are announcing a set of tools for Amazon SageMaker that makes it much easier for developers to build end-to-end machine learning pipelines to prepare, build, train, explain, inspect, monitor, debug, and run custom machine learning models with greater visibility, explainability, and automation at scale.
The first announcement is Data Wrangler, a feature that aims to automate the preparation of data for machine learning. Data Wrangler enables customers to choose the data they want from their various data stores and import it with a single click. Over 300 built-in data transformers are included to help customers normalize, transform, and combine features without having to write any code.
Frank Farrall, Principal of AI Ecosystems and Platforms Leader at Deloitte, comments:
SageMaker Data Wrangler enables us to hit the ground running to address our data preparation needs with a rich collection of transformation tools that accelerate the process of machine learning data preparation needed to take new products to market.
In turn, our clients benefit from the rate at which we scale deployments, enabling us to deliver measurable, sustainable results that meet the needs of our clients in a matter of days rather than months.
The second announcement is Feature Store. Amazon SageMaker Feature Store provides a new repository that makes it easy to store, update, retrieve, and share machine learning features for training and inference.
Feature Store aims to overcome the problem of storing features that are mapped to multiple models. A purpose-built feature store helps developers to access and share features that make it much easier to name, organize, find, and share sets of features among teams of developers and data scientists. Because it resides in SageMaker Studio – close to where ML models are run – AWS claims it provides single-digit millisecond inference latency.
Mammad Zadeh, VP of Engineering, Data Platform at Intuit, says:
We have worked closely with AWS in the lead up to the release of Amazon SageMaker Feature Store, and we are excited by the prospect of a fully managed feature store so that we no longer have to maintain multiple feature repositories across our organization.
Our data scientists will be able to use existing features from a central store and drive both standardisation and reuse of features across teams and models.
Next up, we have SageMaker Pipelines—which claims to be the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning.
Developers can define each step of an end-to-end machine learning workflow including the data-load steps, transformations from Amazon SageMaker Data Wrangler, features stored in Amazon SageMaker Feature Store, training configuration and algorithm set up, debugging steps, and optimization steps.
SageMaker Clarify may be one of the most important features being debuted by AWS this week considering ongoing events.
Clarify aims to provide bias detection across the machine learning workflow, enabling developers to build greater fairness and transparency into their ML models. Rather than turn to often time-consuming open-source tools, developers can use the integrated solution to quickly try and counter any bias in models.
Andreas Heyden, Executive VP of Digital Innovations for the DFL Group, says:
Amazon SageMaker Clarify seamlessly integrates with the rest of the Bundesliga Match Facts digital platform and is a key part of our long-term strategy of standardising our machine learning workflows on Amazon SageMaker.
By using AWS’s innovative technologies, such as machine learning, to deliver more in-depth insights and provide fans with a better understanding of the split-second decisions made on the pitch, Bundesliga Match Facts enables viewers to gain deeper insights into the key decisions in each match.
Deep Profiling for Amazon SageMaker automatically monitors system resource utilization and provides alerts were required for any detected training bottlenecks. The feature works across frameworks (PyTorch, Apache MXNet, and TensorFlow) and collects system and training metrics automatically without requiring any code changes in training scripts.
Next up, we have Distributed Training on SageMaker which AWS claims make it possible to train large, complex deep learning models up to two times faster than current approaches.
Kristóf Szalay, CTO at Turbine, comments:
We use machine learning to train our in silico human cell model, called Simulated Cell, based on a proprietary network architecture. By accurately predicting various interventions on the molecular level, Simulated Cell helps us to discover new cancer drugs and find combination partners for existing therapies.
Training of our simulation is something we continuously iterate on, but on a single machine each training takes days, hindering our ability to iterate on new ideas quickly.
We are very excited about Distributed Training on Amazon SageMaker, which we are expecting to decrease our training times by 90% and to help us focus on our main task: to write a best-of-the-breed codebase for the cell model training.
Amazon SageMaker ultimately allows us to become more effective in our primary mission: to identify and develop novel cancer drugs for patients.
SageMaker’s Data Parallelism engine scales training jobs from a single GPU to hundreds or thousands by automatically splitting data across multiple GPUs, improving training time by up to 40 percent.
With edge computing advancements increasing rapidly, AWS is keeping pace with SageMaker Edge Manager.
Edge Manager helps developers to optimize, secure, monitor, and maintain ML models deployed on fleets of edge devices. In addition to helping optimize ML models and manage edge devices, Edge Manager also provides the ability to cryptographically sign models, upload prediction data from devices to SageMaker for monitoring and analysis, and view a dashboard that tracks and provided a visual report on the operation of the deployed models within the SageMaker console.
Igor Bergman, VP of Cloud and Software of PCs and Smart Devices at Lenovo, comments:
SageMaker Edge Manager will help eliminate the manual effort required to optimise, monitor, and continuously improve the models after deployment. With it, we expect our models will run faster and consume less memory than with other comparable machine-learning platforms.
As we extend AI to new applications across the Lenovo services portfolio, we will continue to require a high-performance pipeline that is flexible and scalable both in the cloud and on millions of edge devices. That’s why we selected the Amazon SageMaker platform. With its rich edge-to-cloud and CI/CD workflow capabilities, we can effectively bring our machine learning models to any device workflow for much higher productivity.
Finally, SageMaker JumpStart aims to make it easier for developers which have little experience with machine learning deployments to get started.
JumpStart provides developers with an easy-to-use, searchable interface to find best-in-class solutions, algorithms, and sample notebooks. Developers can select from several end-to-end machine learning templates(e.g. fraud detection, customer churn prediction, or forecasting) and deploy them directly into their SageMaker Studio environments.
AWS has been on a roll with SageMaker improvements—delivering more than 50 new capabilities over the past year. After this bumper feature drop, we probably shouldn’t expect any more until we’ve put 2020 behind us.