relevanceai.utils.doc_utils.read_utils#

Module Contents#

class relevanceai.utils.doc_utils.read_utils.DocReadUtils#

This is created as a Mixin for others to easily add to their classes

classmethod get_field(self, field: str, doc: Dict, missing_treatment='raise_error')#

For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {

“kfc”: {

“item”: “chickens”

}

} :param field: Field of a document. :param doc: document :param missing_treatment: Can be one of return_empty_string/return_none/raise_error

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> sample_document = {'kfc': {'item': 'chicken'}}
>>> vi_client.get_field('kfc.item', sample_document) == 'chickens'
classmethod get_fields(self, fields: List[str], doc: Dict, missing_treatment='return_empty_string') List[Any]#

For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {

“kfc”: {

“item”: “chickens”

}

} :param fields: List of fields of a document. :param doc: document

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> sample_document = {'kfc': {'item': 'chicken'}}
>>> vi_client.get_field('kfc.item', sample_document) == 'chickens'
get_field_across_documents(self, field: str, docs: List[Dict], missing_treatment: str = 'return_empty_string')#

For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {

“kfc”: {

“item”: “chickens”

}

} :param fields: List of fields of a document. :param doc: document :param missing_treatment: This can be one of ‘skip’, ‘return_empty_string’

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> documents = vi_client.create_sample_documents(10)
>>> vi_client.get_field_across_documents('size.cm', documents)
# returns 10 values in the nested dictionary
get_fields_across_document(self, fields: List[str], doc: Dict, missing_treatment='return_empty_string')#

Get numerous fields across a document.

get_fields_across_documents(self, fields: List[str], docs: List[Dict], missing_treatment='return_empty_string')#

Get numerous fields across documents.

Example

For document:

docs = [
{

“value”: 2, “type”: “car”

}, {

“value”: 10, “type”: “bus”

}

]

>>> DocUtils().get_fields_across_documents(["value", "type"], docs)
>>> [2, "car", 10, "bus"]
Parameters

missing_treatment (str) – Can be one of [“skip”, “return_empty_string”, “return_none”, “skip_if_any_missing”] If “skip_if_any_missing”, the document will not be included if any field is missing

filter_docs_for_fields(self, fields: List, docs: List)#

Filter for docs if they contain a list of fields

classmethod is_field(self, field: str, doc: Dict) bool#

For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {

“kfc”: {

“item”: “chickens”

}

} :param collection_name: Name of collection. :param job_id: ID of the job. :param job_name: Name of the job.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> sample_document = {'kfc': {'item': 'chicken'}}
>>> vi_client.is_field('kfc.item', sample_document) == True
is_field_across_documents(self, field, documents)#
static list_doc_fields(doc: dict) List[str]#

returns all fields in a document, nested fields are flattened example: input: doc = {‘a’: {‘b’:’v’, ‘c’:’v’},

‘d’:’v’} ‘e’:{‘f’:{‘g’:’v’}

output: [‘d’, ‘a.b’, ‘a.c’, ‘e.f.g’]

classmethod subset_documents(self, fields: List[str], docs: List[Dict], missing_treatment: str = 'return_none') List[Dict]#
Parameters
  • fields – A list of fields of interest.

  • docs – A list of documents that may or may not have the chosen fields.

  • missing_treatment – Cane be on of return_empty_string/return_none/raise_error

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> docs = [
...     {"kfc": {"food": "chicken nuggets", "drink": "soda"}}
...     {"mcd": {"food": "hamburger", "drink": "pop"}}
... ]
>>> fields = [
...     "kfc.food", "kfc.drink", "mcd.food", "mcd.drink"
... ]
>>> vi_client.subset_documents(fields, docs) == [
...     {
...         "kfc.food": "chicken nuggets", "kfc.drink": "soda"},
...         "mcd.food": "", "mcd.drink": ""
...     },
...     {
...         "kfc.food": "", "kfc.drink": ""},
...         "mcd.food": "hamburger", "mcd.drink": "pop"},
...     }
... ]