relevanceai.utils.doc_utils.read_utils
#
Module Contents#
- class relevanceai.utils.doc_utils.read_utils.DocReadUtils#
This is created as a Mixin for others to easily add to their classes
- classmethod get_field(self, field: str, doc: Dict, missing_treatment='raise_error')#
For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {
- “kfc”: {
“item”: “chickens”
}
} :param field: Field of a document. :param doc: document :param missing_treatment: Can be one of return_empty_string/return_none/raise_error
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> sample_document = {'kfc': {'item': 'chicken'}} >>> vi_client.get_field('kfc.item', sample_document) == 'chickens'
- classmethod get_fields(self, fields: List[str], doc: Dict, missing_treatment='return_empty_string') List[Any] #
For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {
- “kfc”: {
“item”: “chickens”
}
} :param fields: List of fields of a document. :param doc: document
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> sample_document = {'kfc': {'item': 'chicken'}} >>> vi_client.get_field('kfc.item', sample_document) == 'chickens'
- get_field_across_documents(self, field: str, docs: List[Dict], missing_treatment: str = 'return_empty_string')#
For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {
- “kfc”: {
“item”: “chickens”
}
} :param fields: List of fields of a document. :param doc: document :param missing_treatment: This can be one of ‘skip’, ‘return_empty_string’
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> documents = vi_client.create_sample_documents(10) >>> vi_client.get_field_across_documents('size.cm', documents) # returns 10 values in the nested dictionary
- get_fields_across_document(self, fields: List[str], doc: Dict, missing_treatment='return_empty_string')#
Get numerous fields across a document.
- get_fields_across_documents(self, fields: List[str], docs: List[Dict], missing_treatment='return_empty_string')#
Get numerous fields across documents.
Example
For document:
- docs = [
- {
“value”: 2, “type”: “car”
}, {
“value”: 10, “type”: “bus”
}
]
>>> DocUtils().get_fields_across_documents(["value", "type"], docs) >>> [2, "car", 10, "bus"]
- Parameters
missing_treatment (str) – Can be one of [“skip”, “return_empty_string”, “return_none”, “skip_if_any_missing”] If “skip_if_any_missing”, the document will not be included if any field is missing
- filter_docs_for_fields(self, fields: List, docs: List)#
Filter for docs if they contain a list of fields
- classmethod is_field(self, field: str, doc: Dict) bool #
For nested dictionaries, tries to access a field. e.g. field = kfc.item This should return “chickens” based on doc below. {
- “kfc”: {
“item”: “chickens”
}
} :param collection_name: Name of collection. :param job_id: ID of the job. :param job_name: Name of the job.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> sample_document = {'kfc': {'item': 'chicken'}} >>> vi_client.is_field('kfc.item', sample_document) == True
- is_field_across_documents(self, field, documents)#
- static list_doc_fields(doc: dict) List[str] #
returns all fields in a document, nested fields are flattened example: input: doc = {‘a’: {‘b’:’v’, ‘c’:’v’},
‘d’:’v’} ‘e’:{‘f’:{‘g’:’v’}
output: [‘d’, ‘a.b’, ‘a.c’, ‘e.f.g’]
- classmethod subset_documents(self, fields: List[str], docs: List[Dict], missing_treatment: str = 'return_none') List[Dict] #
- Parameters
fields – A list of fields of interest.
docs – A list of documents that may or may not have the chosen fields.
missing_treatment – Cane be on of return_empty_string/return_none/raise_error
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> docs = [ ... {"kfc": {"food": "chicken nuggets", "drink": "soda"}} ... {"mcd": {"food": "hamburger", "drink": "pop"}} ... ] >>> fields = [ ... "kfc.food", "kfc.drink", "mcd.food", "mcd.drink" ... ] >>> vi_client.subset_documents(fields, docs) == [ ... { ... "kfc.food": "chicken nuggets", "kfc.drink": "soda"}, ... "mcd.food": "", "mcd.drink": "" ... }, ... { ... "kfc.food": "", "kfc.drink": ""}, ... "mcd.food": "hamburger", "mcd.drink": "pop"}, ... } ... ]