relevanceai.operations.text_finetuning.supervised_finetuning_ops#

Warning

This is a beta feature and will be changing in the future. Do not use this in production systems.

Example

Train a text model using TripletLoss

You can find out more about different types of TripletLoss on https://www.sbert.net/docs/package_reference/losses.html

from relevanceai import Client
client = Client()
ds = client.Dataset("ecommerce")
ops = SupervisedTripleLossFinetuneOps.from_dataset(
    dataset=ds,
    base_model="distilbert-base-uncased",
    chunksize=16,
    triple_loss_type:str='BatchHardSoftMarginTripletLoss'
)
ops.run(text_field="detail_desc", label_field="_cluster_.desc_use_vector_.kmeans-10", output_dir)

Module Contents#

class relevanceai.operations.text_finetuning.supervised_finetuning_ops.SupervisedTripleLossFinetuneOps(dataset, base_model: str = 'sentence-transformers/all-mpnet-base-v2', triple_loss_type: str = 'BatchHardSoftMarginTripletLoss', chunksize: int = 32, save_best_model: bool = True, credentials: Optional[relevanceai.client.helpers.Credentials] = None)#

Batch API client

define_loss(self)#
define_evaluator(self, text_data: List[str], labels: List[int], name='supervised_finetune_dev_eval')#
static build_triple_data(text_data, labels)#
prepare_data_for_finetuning(self, text_data: List[str], labels: List[int])#
fine_tune(self, train_data: List, dev_data: List = None, epochs: int = 3, output_path: str = 'trained_model')#
get_model(self, output_path: Optional[str] = None)#
fetch_text_and_labels_from_dataset(self, text_field, label_field)#
run(self, text_field: str, label_field: str, epochs: int = 3, output_dir: str = 'trained_model', percentage_for_dev: float = None)#

Supervised finetuning a model using TripleLoss

Example

from relevanceai import Client
client = Client()

ds = client.Dataset("quickstart")
from relevanceai.operations.text_finetuning.supervised_finetuning_ops import SupervisedTripleLossFinetuneOps
ops = SupervisedTripleLossFinetuneOps.from_dataset(ds)
ops.run(text_field="detail_desc", label_field="_cluster_.desc_use_vector_.kmeans-10", output_dir)
Parameters
  • text_field (str) – The field you want to use as input text for fine-tuning

  • label_field (str) – The field indicating the classes of the input

  • output_dir (str) – The path of the output directory

  • percentage_for_dev (float) – a number between 0 and 1 showing how much of the data should be used for evaluation. No evaluation if None

classmethod from_client(self, client, *args, **kwargs)#
classmethod from_dataset(self, dataset: Any, base_model: str = 'distilbert-base-uncased', **kwargs)#