Here you will find a list of changes for each package update related to the Relevance AI Python library.


## What’s Changed * Feature/dr refactor by @JackyKoh in * fix cluster filters, standardise insert_metadata, fix decimal on cosine by @JackyKoh in * add label rework by @boba-and-beer in * add check_vector_fields by @JackyKoh in * Create test_unit_dev.yaml by @joshp-f in * Implement test_list_closest by @ajhwb in * Feature/sdk 340 labelling rework by @jtwinrelevanceai in * Feature/sdk 382 sentence splitter operator by @boba-and-beer in * Implemented test_apply by @ajhwb in * fix type error for filters by @boba-and-beer in * fixed vectorize by @jtwinrelevanceai in * Improved test_list_closest by @ajhwb in * fi subcluter by @boba-and-beer in * Feature/DR Naming Convention (Alias to be vector name instead.) by @JackyKoh in * Feature/sdk 390 2 by @boba-and-beer in * fixed for concatenating vector fields by @jtwinrelevanceai in * Development readme by @ajhwb in * Added sentence-splitter package by @ajhwb in * Feature/sdk 390 2 by @boba-and-beer in * Feature/sdk 400 partialclusterops by @boba-and-beer in * sklearn integration and updating run/transform by @jtwinrelevanceai in * Feature/sdk 406 fix sentiment ops by @boba-and-beer in

## New Contributors * @joshp-f made their first contribution in * @ajhwb made their first contribution in

Full Changelog:…v2.3.0


## What’s Changed * support stuff by @boba-and-beer in * Feature/fix datasets by @boba-and-beer in * V2.0.0 by @boba-and-beer in * V2.0.1 by @boba-and-beer in * V2.0.2 by @boba-and-beer in * V2.1.0 by @boba-and-beer in * V2.1.0 by @boba-and-beer in * V2.1.1 by @boba-and-beer in * V2.1.2 by @boba-and-beer in * V2.1.3 by @boba-and-beer in * V2.1.4 by @boba-and-beer in * V2.1.5 by @boba-and-beer in * V2.1.6 by @boba-and-beer in * V2.1.8 by @boba-and-beer in * fix the auth header by @boba-and-beer in * fix n_clusters by @boba-and-beer in * fix analyze sentiment by @boba-and-beer in * Distance Matrix data stored in metatdata by @jtwinrelevanceai in * Bad Regex for dataset_id by @jtwinrelevanceai in * changed default algo and n_clusters by @jtwinrelevanceai in * default allLM-mini distillroberta from sentence transformers by @jtwinrelevanceai in * rounding cluster report to 3 decimals by @jtwinrelevanceai in * Smart Typechecking by @jtwinrelevanceai in * Feature/small refactor by @JackyKoh in * Add clear cache and cache info by @boba-and-beer in * hotfix/sdk 361 sub clustering breaks for small parent by @boba-and-beer in * Feature/remove unnecessary service endpoints by @boba-and-beer in * migrate to after_id in docs by @boba-and-beer in * Revert “Feature/remove unnecessary service endpoints” by @boba-and-beer in * Move DocUtils to Main Package by @jtwinrelevanceai in * Readthedocs Feedback & Update by @jtwinrelevanceai in * Feature/fix error by @boba-and-beer in * added request log debugging with env var by @jtwinrelevanceai in * Feature/update chunksearch guides by @boba-and-beer in * Add merge cluster endpoint by @boba-and-beer in * Feature/cluster metadata by @JackyKoh in

Full Changelog:…v2.2.0


  • Add summarize_closest_to_center

  • Add guides

  • Add advanced_search tutorial and dataset functionality

  • Add cluster_heatmaps

  • Add advanced_search

  • Re-write of subcluster

  • Refactor of BaseOps

  • Re-write of vectorize and cluster


  • Add question answering operation

  • Fix bug in subclustering where filters do not work


  • Fix for hybrid_search endpoint to include sum_fields



  • auto_cluster -> cluster

  • clusterer.list_closest_to_center() -> clusterer.list_closest()


  • Provide a way to turn off logger

  • auto_cluster now supports models

  • Change metadata experience into more intuitive object

  • Add base workflow

  • Add sentiment analysis workflow

  • Add chunking dataset

  • Add smaller dataset export

  • Fix unique cluster IDs

  • Add feature for workflows

  • Add operate function to run on each cluster

  • Add a way to create centroids if they do not exist using the create_centroids function

  • Fix metadata insertion eperience

  • Fix community detection to return clusterops object

  • Move backend of apply and bulk_apply to asynchronous function

  • Add way to list vector fields

  • Add Subclustering

  • Add Sentiment Analysis

  • Complete SDK reference restructure

  • SDK aesthetic overhaul

  • And much more!


  • Major folder refactor -> official renaming of ops to workflows in certain areas

  • Backend separation into interfaces

Automated Changes:

## What’s Changed ## What’s Changed * V1.4.1 by @boba-and-beer in * V1.4.2 by @boba-and-beer in * V1.4.3 by @boba-and-beer in * V1.4.3 by @boba-and-beer in * Added Missing Centroid Endpoints by @jtwinrelevanceai in * Feature/refactor folders by @boba-and-beer in * add cachesize max by @boba-and-beer in * Feature/cleanup by @boba-and-beer in * standardised the way that _id is created for each document by @jtwinrelevanceai in * feature/pro-1622-add-dffacets-and-dfaggregate by @ofrighil in * feature/pro-1624-move-certain-files-around by @ofrighil in * move the fitting and predicting to after by @boba-and-beer in * add fix for testing by @boba-and-beer in * Feature/pro 1613 better clusters auto clustering 3 by @boba-and-beer in * Update Metadata Experience by @boba-and-beer in * feature/pro-1309-migrate-datasets-from-australia-to-us by @ofrighil in * Feature/pro 1626 sentiment analysis by @boba-and-beer in * Feature/fix community detection by @boba-and-beer in * Feature/ploty from docs by @jtwinrelevanceai in * Feature/scaling by @jtwinrelevanceai in * Feature/pull update push args by @jtwinrelevanceai in * feature/pro-1507-add-2-series-together-in-pandas by @ofrighil in * [WIP] Feature/add comm detection by @boba-and-beer in * Feature/add centroid insertion by @boba-and-beer in * fix common mistake of inputting token as project by @JackyKoh in * add a way to run the function for operating by @boba-and-beer in * Fix metadata for workflows by @boba-and-beer in * add recieve dataset by @JackyKoh in * Feature/create workflow diagrams by @boba-and-beer in * add parameters for migration by @boba-and-beer in * Fix community detection by @boba-and-beer in * fix distribution measure plot by @boba-and-beer in * feature/pro-1666-improving-original-pull-update-push by @ofrighil in * Feature/add references by @boba-and-beer in * Feature/add references by @boba-and-beer in * fix the metadata insertion by @boba-and-beer in * Feature/pro 1698 fix references by @boba-and-beer in * Feature/cor 722 error shouldnt happen on dev server by @boba-and-beer in * Feature/move ops to workflows init by @boba-and-beer in * feature/pro-1647-fix-progress-bar-for-pull_update_push by @ofrighil in * [WIP] Better Code Base for ClusterOps by @jtwinrelevanceai in * fix community detection UX by @boba-and-beer in * feature/pro-1723-store-vectorize-metadata-in-sdk by @ofrighil in * feature/pro-1726-fix-centroid-insertion-for-community by @ofrighil in * Hotfix/cloudfront by @boba-and-beer in * feature/pro-1724-fix-vectorhub-tests by @ofrighil in * feature/pro-1686-clusterops-show by @ofrighil in * add coco by @boba-and-beer in * SDK Style Guide and Refactor by @jtwinrelevanceai in * Feature/fix refs by @boba-and-beer in * Fixing fit predict by @charyeezy in * feature/pro-1742-change-it-so-we-pass-token-instead-of by @ofrighil in * Feature/pro 1750 by @jtwinrelevanceai in * forward -> operate by @jtwinrelevanceai in * Fix Tests after SDK refactor by @jtwinrelevanceai in * Better Clusters | Internal metric evaluation by @jtwinrelevanceai in * Feature/fix reports by @boba-and-beer in * add reports init file by @JackyKoh in * feature/pro-1751-fixing-the-sync-progress-bar by @ofrighil in * Feature/fix max chunksize by @boba-and-beer in * update refs by @boba-and-beer in * Feature/pro 1782 simple plotting distribution skews by @boba-and-beer in * remoe unstruc by @boba-and-beer in * rename vis to viz by @boba-and-beer in * update makefile by @boba-and-beer in * add sequential workflows by @boba-and-beer in * ensure that you are setting labels on doc subset by @boba-and-beer in * Fix datasets by @boba-and-beer in * add cluster ops by @boba-and-beer in * Increase Coverage by @jtwinrelevanceai in * Fix/config by @boba-and-beer in * fix aggregates by @boba-and-beer in * fixed display after clustering by @jtwinrelevanceai in * Feature/update refs by @boba-and-beer in * Fix centroid insertion by @boba-and-beer in * Feature/update the references amazing wow by @boba-and-beer in * [WIP] Docstrings by @jtwinrelevanceai in * added iris and palmers penguins by @jtwinrelevanceai in * Feature/fix cluster references by @boba-and-beer in * update client ref by @boba-and-beer in * Guides /docsrc by @jtwinrelevanceai in * update the subclusterops by @boba-and-beer in * Feature/pro 1709 fix sentiment analysis workflow by @boba-and-beer in * BaseOps methods by @jtwinrelevanceai in * update sentiment by @boba-and-beer in * fix vectorize by @boba-and-beer in * fix subclustering by @boba-and-beer in


  • Reduced pull_update_push log file output

  • Add delete_documents utility

  • Add deployables functions

  • Check if global datasets already exist


  • Rename image to media


  • Fix bug with upsert_images

  • Suggest link with dashboard link


  • Improve Dataset.community_detection such that it takes vectors as well

  • Add support for image uploads


  • Add metadata


  • Add verbose verbose argument

  • Fix cluster_keyphrases


  • Added pull_update_push_async

  • Introduced asynchronous client

  • Fix bug in facets


  • Add support for subclustering

  • Add community detection algorithm Dataset.community_detection

  • Update Dataset.vectorize to ignore already-vectorized fields and modified output to include those vector names

Additional info on PRs:


  • Add dimensionality reduction for documents

  • Change maximum chunksize to 500


  • Adjust max cache size, from one to eight, of Dataset.to_pandas_dataframe and Series._get_pandas_series

  • Fix dataset analytics


  • Add initial bias detection

  • Fix analytics support

  • Remove test tracking


  • Add hotfix if pandas functions not supported.


  • Add nltk-rake support for keyphrases

  • Add more documentation around cluster reporting

  • Enable Dataset and Series access pandas DataFrame and Series methods, respectively

  • Change from a property to a method and add pandas DataFrame output

  • Change Dataset.vectorize to call pull_update_push just once instead of twice


  • Add Cluster Report endpoints

Developer changes:

  • Fix bug with analytics and change to an env variable tracker for outermost function


Developer changes:


  • All list and dict default arguments are changed to None.

Other Changes - Introduced corr, a method to plot the correlation between two fields, in Dataset - Export to Pandas DataFrame



  • When upserting, you will no longer be returned confusing inserting/write statements.

Other Changes:

  • Add option to create_id when inserting

Developer changes:

  • Reduced number of documents in testing

  • Make tracking only occur at the uppermost level and not the bottom level



  • When inserting/writing, you will now no longer be returned confusing insertion/write statements

but if it errors, it will return the JSON object with the necessary details.

  • Add image tooling around processing (currently an alpha feature to be tested)

  • Add vectorize method for text and images



  • Add grading to auto_clustering

  • Bug fix for cluster report

  • Add DBSCAN centroids


  • Add support for BIRCH, OPTICS and all native sklearn algorithms


  • Added new DR methods to auto_reduce_dimensions

  • Fixed documentation on clustering


  • Change data structure of report structure


  • Add low-touch way to label with a given model

  • Add label_from_dataset, label_from_list, label_from_common_words


  • Fix document-utils for clustering on DR


  • Add grading for cluster report


  • Fix http client and regionalisation issues and remove need for firebase


Breaking changes

  • get_cluster_internal_report has now been renamed to internal_report

Non-breaking changes:

  • Remove repetitive print statements

  • Add outlier support for cluster report

  • Support for centroids and medoids in typing

  • Add pretty printing for cluster overall reporting


  • add launch_search_app for dataset functionality

  • remove saving .creds.json to avoid file caching


  • Fix print error message with segment

  • Separate out JSON Encoder


  • Fix pandas serialization for UTF-encoding errors

  • Move search app

  • Change print search dashboard app URL

  • Fix regionalisation error when authenticating client.


  • Make pandas dataframe serializable with vectors


  • Clustering report functionality

  • Add fix and test for new cluster aggregate

  • Add document mocking utility

  • Add integration for cluster reporting

  • Fix bug for sklearn clustering

  • Add segment tracking with option to turn off

  • Add print statement after inserting


  • Fix warning missing parameter

  • Remove dataset_id from get_documents

  • Fix URL bug if you are logging in from old-australia-east


  • Fix UX flow

  • Make US-East-1 the default

  • Add force refresh

  • Rework Login UX

  • Mention region when connecting

  • Make the authentication message super cool

  • Fix centroids to Node endpoint

  • Update the delete request


  • Make asynchronous dashboard request


  • Fix cluster aggregate

  • Fix for login

  • Make adding firebase UID not breaking



BREAKING CHANGES - :code-block:`predict_dataset` has been corrected to :code-block:`predict_update` - :code-block:`fit_dataset_by_partial` has been corrected to :code-block:`partial_fit_dataset` - :code-block:`fit_partial` instances have been corrected to :code-block:`partial_fit`

  • Hotfix auto_cluster when having more clusters than batch size

  • Add dashboard link after clustering

  • Fix references when listing closest and furthest


The most important part of this change is adding more modularity to the clustering functions. This is important because previous functions tried to abstract away too much. Now, users


  • Clustering fit_transform is not a fit_predict to align with SKLearn’s methods

  • Rename Clusterer to ClusterOps

  • fit has now been broken down into fit_predict_update

  • Removed KMeansClusterer

Non-breaking changes:

  • Create a CentroidClusterBase and update it to ClusterBase and a CentroidBase

  • Added a fit_update

  • Added support for batch clustering using MiniBatchKMeans

  • Added functional Insert_centroid_documents to the ClusterOps object

  • Introduced fit_partial to the clusterer

  • Introduced fit_partial_documents

  • Introduced fit_dataset_by_partial to allow users to be able to fit on a dataset if they want to use

partial_fit - Introduced fit_update_dataset - Introduced fit_update_dataset_by_partial which will fit the dataset, predict the dataset and insert the centroids if there are expected centroids in the dataset - Introduced fit_partial_predict_update to allow for fitting, predicting and updating the dataset in 1 go - Fixed arguments in the clusterer object to now take an optional vector_fields and dataset - Feature/fix clustering transform by @boba-and-beer in - add fix for dim reduction by @boba-and-beer in - removed python manta on startup by @jtwinrelevanceai in - Feature/add support for batch by @boba-and-beer in - Hotfix/pull update filter error by @boba-and-beer in - auto_cluster function by @jtwinrelevanceai in - Feature/try fix cluster references by @boba-and-beer in

Full Changelog:…v0.33.0


  • Apply hotfix to pull_update_push



  • Move search to inside operations to keep consistency

New Features:

  • Added Dimensionality Reduction

  • Added Labelling

Non-breaking changes:

  • Fix bug with clusterer using fit_predict now

Full Changelog:…v0.32.0


  • Include more native sklearn integration. KMeans and MiniBatchKMeans now supported natively.

  • Fix to vectorize and sample in Series

  • Fixes to cluster aggregation for the clusterer class and cluster metrics for the clusterer class

  • groupby and agg now supported

  • Added warnings to vectorize method

  • Bug Fix to list_closest_to_center to now return results

  • Add send_dataset

  • Add clone_dataset

  • Add references to available example datasets

  • Added vector_search, chunk_search , multistep_chunk_search, hybrid_search

as part of the search endpoints

Developer changes:

  • Added warnings module (boba-and-beer)

  • Folder factor for datasets API (boba-and-beer)

  • 2x Test speed up by introducing pytest-xdist with file distribution strategy (boba-and-beer)

Tests are now run modularly. In other words, if you want tests to run together, keep them in the same file. If you want them to run in parallel, keep them in separate files.


Non-breaking changes:

  • Fixed incorrect reference in update_documents

  • Fixed bulk getting the wrong document in df.get() and added subsequent unit test

  • Fixed references with apply

  • Added health endpoints

  • Added insert_pandas_dataframe endpoints

  • Test folder refactor and clean up

Developer changes: - Forced precommits - Added minimum pytest coverage

Auto Generated Release Notes:



  • Renamed all docs references to documents

  • Renamed all cluster_alias references to alias

  • Changed functionality in CentroidClusterBase

  • Renamed chunk_size to chunskize in get_all_documents

  • Renamed retrieve_chunk_size to retrieve_chunksize in df.apply and df.bulk_apply

  • Schema is now a property and not a method!

  • get_centroid_documents now no longer takes a field

  • Removal of any mention of centroid_vector_ as those should now be replaced with the

actual vector field name the centroids are derived from

Non-breaking changes:

  • Added head to Series object

  • Add CentroidClustererbase and CentroidClusterBase classes to inherit from

  • Deprecated KMeansClusterer in documentation and functionality

  • Add fix for clusterer for missing vectors in documents by forcing filters

  • Support for multi-region base URL based on frontend parsing

  • Added AutoAPI to gitignore as we no longer want to measure that

  • Add tighter sklearn integration

  • Add CentroidClusterBase

  • Clean up references around Clusterbase, ClusterOps, Dataset

  • Add reference to Client object

  • Hotfix .sample()

  • Update the Base Ingest URL to gateway and set to appropriate default

  • Added support for base url token

  • Removed QC from references

  • Add integration reference

  • Fixed centroid insertion for Dataset

  • Refactor of tests based

  • Add clustering test around clustering

  • Separation of references to clean up clustering and sidebar menu navigation

  • Fix reference examples







  • *Breaking Change*️ Change pull_update_push to use dataset ID

  • Added centroid distance evaluation

  • Added JSONShower to df.head() so previewing images is now possible

  • Refactor Pandas Dataset API to use BatchAPIClient

  • Modularise testing infrastructure to use separate datasets

  • Add aggregation, groupby pandas API support

  • Added GroupBy, Series class for Datasets

  • Added

  • Added documentation testing

  • Added df.apply()

  • Added additional functionality for sampling etc.

  • Fixed documentation for Datasets API

  • Add new monitoring health test for chunk data structure

  • Add fix for csv reading for _chunk_ to be parsed as actual Python objects

and not strings


  • Fixed datasets.documents.update_where so it runs

  • Added more tests around multivector search

  • Added Pandas-like Dataset Class for interacting with SDK (Alpha)

  • Added datasets.cluster.centroids.list_furthest_from_centers and datasets.cluster.centroids.list_closest_to_centers

  • Folder Refactor


  • Fix missing import in plotting since internalising plots

  • Add support for vector labels

  • Remove background axes from plot


  • Fix incorrect URL being submitted to frontend


  • Fix string parsing issue for endpoints and dashboards


  • Cluster labels are now lower case

  • Bug fix on centroids furthest from center

  • Changed error message

  • Fixed Dodgy string parsing

  • Fixed bug with kmeans_cluster 1 liner by supporting getting multiple centers


  • Add CSV insertion

  • Make JSON encoder utility class for easier customisation

  • Added smarter parsing of CSV


  • Bug fixes


  • Added JSON serialization and consequent test updates

  • Bug fix to cluster metrics

  • Minor fix to tests