Split Sentences#
You can split sentences using a simple function
from relevanceai import Client
client = Client()
ds = client.Dataset("sample")
ds.split_sentences(
text_fields=["sample"]
)
For more fine-grained control, you can use the natural operator:
from relevanceai.operations_new.processing.text.sentence_splitting.ops import (
SentenceSplitterOps,
)
ops = SentenceSplitterOps(language=language)
for c in self.chunk_dataset(select_fields=text_fields):
for text_field in text_fields:
c = ops.run(
text_field=text_field,
documents=c,
inplace=True,
output_field=output_field,
)
self.upsert_documents(c)
If you want to split sentences infinitely, you can simply use this:
classs NewSentenceSplitter(SentenceSplitterOps):
def split_text(self, text):
# the return MUST be a list of texts
return ["text_section_1", "text_section_2"]