relevanceai.operations.text_finetuning.supervised_finetuning_ops
#
Warning
This is a beta feature and will be changing in the future. Do not use this in production systems.
Example
Train a text model using TripletLoss
You can find out more about different types of TripletLoss on https://www.sbert.net/docs/package_reference/losses.html
from relevanceai import Client
client = Client()
ds = client.Dataset("ecommerce")
ops = SupervisedTripleLossFinetuneOps.from_dataset(
dataset=ds,
base_model="distilbert-base-uncased",
chunksize=16,
triple_loss_type:str='BatchHardSoftMarginTripletLoss'
)
ops.run(text_field="detail_desc", label_field="_cluster_.desc_use_vector_.kmeans-10", output_dir)
Module Contents#
- class relevanceai.operations.text_finetuning.supervised_finetuning_ops.SupervisedTripleLossFinetuneOps(dataset, base_model: str = 'sentence-transformers/all-mpnet-base-v2', triple_loss_type: str = 'BatchHardSoftMarginTripletLoss', chunksize: int = 32, save_best_model: bool = True, credentials: Optional[relevanceai.client.helpers.Credentials] = None)#
Batch API client
- define_loss(self)#
- define_evaluator(self, text_data: List[str], labels: List[int], name='supervised_finetune_dev_eval')#
- static build_triple_data(text_data, labels)#
- prepare_data_for_finetuning(self, text_data: List[str], labels: List[int])#
- fine_tune(self, train_data: List, dev_data: List = None, epochs: int = 3, output_path: str = 'trained_model')#
- get_model(self, output_path: Optional[str] = None)#
- fetch_text_and_labels_from_dataset(self, text_field, label_field)#
- run(self, text_field: str, label_field: str, epochs: int = 3, output_dir: str = 'trained_model', percentage_for_dev: float = None)#
Supervised finetuning a model using TripleLoss
Example
from relevanceai import Client client = Client() ds = client.Dataset("quickstart") from relevanceai.operations.text_finetuning.supervised_finetuning_ops import SupervisedTripleLossFinetuneOps ops = SupervisedTripleLossFinetuneOps.from_dataset(ds) ops.run(text_field="detail_desc", label_field="_cluster_.desc_use_vector_.kmeans-10", output_dir)
- Parameters
text_field (str) – The field you want to use as input text for fine-tuning
label_field (str) – The field indicating the classes of the input
output_dir (str) – The path of the output directory
percentage_for_dev (float) – a number between 0 and 1 showing how much of the data should be used for evaluation. No evaluation if None
- classmethod from_client(self, client, *args, **kwargs)#
- classmethod from_dataset(self, dataset: Any, base_model: str = 'distilbert-base-uncased', **kwargs)#