Workflow
DataSAILs General Workflow
Input formats
CSV and TSV Files
FASTA files
Pickle Files
HDF5 Files
Molecular Input Files
Clustering
Overview
Default Algorithms
Details about the clustering algorithms
Clustering of Embeddings
Individual Algorithms
Supported Solvers
Main Solvers
Additional Solvers
Splits
One-Dimensional Data
Two-Dimensional Data
Splitting Techniques
Interfaces
Commandline Interface
General Arguments
Splitting Arguments
Entity Arguments
Package
datasail
Evaluation of Data Leakage
eval_splits()
Examples
Other Initiatives
MoleculeNet
Leak Proof PDBBind (LP-PDBBind)
Protein Ligand INteraction Dataset and Evaluation Resource (PLINDER)
Protein INteraction Dataset and Evaluation Resource (PINDER)
Gold Standard Human Proteome Dataset for sequence-based PPI prediction
Split QM9 by SMILES
Load the Dataset
Run DataSAIL
The output
Split BACE by Weight
Load the Dataset
Run DataSAIL
The output
Split PDBBind in Two Dimensions
Load the Dataset
Preparation of Ligands
Preparation of Targets
Run DataSAIL
The output
Split an RNA dataset
Run DataSAIL
The output
Split Tox21 with Stratification
Load Tox21 Dataset
Run DataSAIL
The output
Split NASA Asteroids with DataSAIL
Load the NASA Asteroids dataset
Define the distance metric
Compute the distance matrix
Split the dataset
Investigate the splits
Train and test a Random Forest classifier
Extend DataSAIL
Contributing to DataSAIL
Contributing to the Documentation
Examples
How to Add a new Similarity or Distance Metric to DataSAIL
0. Create a Fork of the DataSAIL Repository
1. Installability
2. Registration
4. Registration – cont’d
5. Tool Arguments
7. Pull Request
Miscellaneous
Frequently Asked Questions
Theoretical and Conceptional Questions
Practical Questions
DataSAIL on Posters
GCB 2023 & RDKit UGM
HIPS Symposium and PhD Assembly @HIPS
DataSAIL
»
Overview: module code
All modules for which code is available
datasail.eval
datasail.sail