Data-driven Protein Engineering

Discover functional protein sequences optimized to your specifications

Get Early Access

Intuitive and accessible web app

Machine Learning-driven protein engineering at your fingertips

Deploy state-of-the-art ML models based on your sequence and function data to generate new, more diverse variants. No specialized skill required.

OpenProtein.AI Web App Interface

Machine learning-guided mutagenesis

Powerful analytical tools to increase your success rate over standard mutagenesis

The OpenProtein.AI web app provides a suite of software tools to generate novel variant libraries and predict their success over multiple functions of interest. Visualize your mutagenesis data, train machine learning models for functions of interest, define your design objectives, and build optimized variant libraries.

Machine Learning Mutagenesis Interface

Convenient, reliable data management

Track your mutagenesis process and manage your data all in one place

Streamline your research process with advanced in-app data management capabilities. OpenProtein.AI is a secure data repository for large mutagenesis datasets.

Data Management Interface

Data-driven protein engineering

Unlock your data's full potential

OpenProtein.AI mines natural sequence databases and learns from your experimental data to accelerate the iterative design process. Design variants with significantly enhanced activity compared to standard directed mutagenesis.

3D visualization showing ML extrapolated fitness landscape with optimized proteins, training data, and protein sequence space

Experimental efficiency

Optimize multiple properties simultaneously

OpenProtein.AI can improve multiple properties simultaneously to reduce experimental iterations. Every subsequent round and project benefits from previous data.

Dashboard showing optimization of multiple protein properties with experimental data visualization

Sequence-to-function mapping

Predict functions of interest, identify mutagenesis hotspots, and design combinatorial variant libraries

Develop & deploy models based on your data to predict activity for any input sequence and map all single site substitutions to identify linchpin locations for site-saturating mutagenesis. Visualize functional predictions for all single-site substitutions and export amino acid distributions for degenerate and combinatorial variant libraries.

Interface showing sequence analysis with mutagenesis hotspots and variant library design tools

Powered by AI. Inspired by evolution

Generative protein design with PoET

Design protein sequences de novo, no functional or structural data required

Sign Up

Free for academic use!

PoET Interface Preview

What is PoET?

PoET (Protein Evolutionary Transformer) is an autoregressive retrieval-augmented generative transformer protein language model.

Given a set of sequences representing the evolutionary context, PoET directly infers the fitness landscape on which natural selection acts to optimize proteins under functional constraints on the amino acid sequences. PoET can then generate new sequences from that evolutionary process or score the fitness of arbitrary query sequences under that process.

PoET Process Diagram

Generate novel, functional, and diverse sequences

PoET efficient sampling from the learned evolutionary process

Generate Sequences Demo
Sequence Analysis

Analyze the fitness landscape and prioritize variants

Given a parent sequence, explore the local fitness landscape or rank specific variants to designs focused mutagenesis libraries

Fitness Landscape Analysis

Sequence-to-function mapping

PoET is simple to use and works out of the box

Intuitive workflows are quick and easy to use. Results are returned in minutes and can be exported in multiple formats.

Sequence to Function Mapping

Tailor your designs

Specialize PoET to your applications

Define your evolutionary context through prompt customization. Use any sequence database with custom MSAs. Adjust diversity of the model with in-software similarity level settings.

Customize PoET

State-of-the-art variant effect prediction

Validated on 90 different deep mutational scanning datasets

PoET provides state-of-the-art de novo variant function predictions across a wide range of

  • protein families,
  • organisms of origin,
  • properties of interest, and
  • MSA depths.

PoET can model

  • substitutions, insertions, and deletions,
  • single and higher order variants.
Performance Comparison Table

Performance is measured as the rank correlation between variant likelihoods and measured function. N/A is reported for models that cannot predict indels.

Enhanced mutagenesis workflow

Engineer better proteins, faster!

Variant Library Design Features

  • Evolutionary sequence analysis
  • Generative protein language models
  • Identify mutagenesis hot spots
  • Design combinatorial variant libraries
  • Optimize variant libraries for multiple design objectives

Variant Fitness Predictions

  • Train models to predict function(s) from your mutagenesis data
  • Predict variant sequence activity for functions of interest
  • Perform single site substitution, deletion, and insertion analyses
  • Create likelihood-activity relationship generative models

Actionable Results

  • Identify target substitution, insertion, and deletion sites
  • Design single or higher order variants with enhanced activity
  • With statistical coupling analysis, discover areas with high potential for epistasis