Datasets#

Relevance AI Platform offers free datasets for users. These datasets have been licensed under Apache 2.0.

class relevanceai.utils.datasets.ExampleDatasets#

Bases: object

__init__()#
get_dataset(name, number_of_documents=None, select_fields=None)#

Download an example dataset :param name: Name of example dataset :type name: string :param number_of_documents: Number of documents to download :type number_of_documents: int :type select_fields: Optional[List] :param select_fields: Fields to include in the dataset, empty array/list means all fields. :type select_fields: list

list_datasets()#

List of example datasets available to download

relevanceai.utils.datasets.example_documents(dataset_id, number_of_documents=None)#
relevanceai.utils.datasets.get_coco_dataset(number_of_documents=1000, include_vector=True, select_fields=None)#

Get the coco dataset

relevanceai.utils.datasets.get_dummy_ecommerce_dataset(number_of_documents=739, select_fields=None)#

Download an example e-commerce dataset

Total Len: 739

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_image_clip_vector_': [...],
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'product_title_clip_vector_': [...],
    'query': 'steel necklace',
    'source': 'eBay'
}
Return type

List[Dict[Any, Any]]

relevanceai.utils.datasets.get_ebay_app_review_dataset(number_of_documents=100, select_fields=None)#

Download an example playstore reviews data for ebay

Total Len: 10000

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': '4b9b92c3-011d-4f43-98ca-e131958a49f4',
    'at': datetime.datetime(2022, 6, 20, 12, 22, 23),
    'content': "PLEASE change the way your app works in terms of swiping through images. If you pull down, you refresh the page. If you swipe left/right, you change images. Problem is, it's far too easy to accidentally pull down while swiping left/right, which ends up resetting the gallery!!!! Please fix this! Otherwise, app works as expected.",
    'repliedAt': None,
    'replyContent': None,
    'reviewCreatedVersion': '6.64.0.3',
    'reviewId': '4b9b92c3-011d-4f43-98ca-e131958a49f4',
    'score': 4.0,
    'thumbsUpCount': 50,
    'userImage': 'https://play-lh.googleusercontent.com/a/AATXAJwrSs35SJYs5BUzJ2blj0zJagZgUZuPfglwcT_f=mo',
    'userName': 'Mitchel Wood'
}
Return type

List

relevanceai.utils.datasets.get_ebay_app_review_encoded_dataset(number_of_documents=100, select_fields=None)#

Download an example playstore reviews data for ebay (all encoded)

Total Len: 10000

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Return type

List

relevanceai.utils.datasets.get_ecommerce_1_dataset(number_of_documents=739, select_fields=None)#

Download an example e-commerce dataset

Total Len: 739

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_image_clip_vector_': [...],
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'product_title_clip_vector_': [...],
    'query': 'steel necklace',
    'source': 'eBay'
}
Return type

List[Dict[Any, Any]]

relevanceai.utils.datasets.get_ecommerce_2_dataset(number_of_documents=1000, select_fields=None)#

Download an example e-commerce dataset

Total Len: 739

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': '711160239',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'query': 'steel necklace',
    'source': 'eBay'
}
relevanceai.utils.datasets.get_ecommerce_3_dataset(number_of_documents=1000, select_fields=None)#

Download an example ecommerce dataset (https://data.world/crowdflower/ecommerce-search-relevance)

Total Len: 15528

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_unit_id': 711158459,
    'product_description': 'The PlayStation 4 system opens the door to an '
                        'incredible journey through immersive new gaming '
                        'worlds and a deeply connected gaming community. Step '
                        'into living, breathing worlds where you are hero of '
                        '...',
    'product_image': 'http://thumbs2.ebaystatic.com/d/l225/m/mzvzEUIknaQclZ801YCY1ew.jpg',
    'product_link': 'http://www.ebay.com/itm/Sony-PlayStation-4-PS4-Latest-Model-500-GB-Jet-Black-Console-/321459436277?pt=LH_DefaultDomain_0&hash=item4ad879baf5',
    'product_price': '$329.98 ',
    'product_title': 'Sony PlayStation 4 (PS4) (Latest Model)- 500 GB Jet Black 'Console'',
    'query': 'playstation 4',
    'rank': 1,
    'relevance': 3.67,
    'relevance:variance': 0.471,
    'source': 'eBay',
    'url': 'http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0&_nkw=playstation%204'
}
Return type

List

relevanceai.utils.datasets.get_ecommerce_dataset(number_of_documents=1000, select_fields=None)#

Download an example ecommerce dataset (https://data.world/crowdflower/ecommerce-search-relevance)

Total Len: 15528

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_unit_id': 711158459,
    'product_description': 'The PlayStation 4 system opens the door to an '
                        'incredible journey through immersive new gaming '
                        'worlds and a deeply connected gaming community. Step '
                        'into living, breathing worlds where you are hero of '
                        '...',
    'product_image': 'http://thumbs2.ebaystatic.com/d/l225/m/mzvzEUIknaQclZ801YCY1ew.jpg',
    'product_link': 'http://www.ebay.com/itm/Sony-PlayStation-4-PS4-Latest-Model-500-GB-Jet-Black-Console-/321459436277?pt=LH_DefaultDomain_0&hash=item4ad879baf5',
    'product_price': '$329.98 ',
    'product_title': 'Sony PlayStation 4 (PS4) (Latest Model)- 500 GB Jet Black 'Console'',
    'query': 'playstation 4',
    'rank': 1,
    'relevance': 3.67,
    'relevance:variance': 0.471,
    'source': 'eBay',
    'url': 'http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0&_nkw=playstation%204'
}
Return type

List

relevanceai.utils.datasets.get_ecommerce_dataset_clean(number_of_documents=1000, select_fields=None)#

Download an example e-commerce dataset

Total Len: 739

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': '711160239',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'query': 'steel necklace',
    'source': 'eBay'
}
relevanceai.utils.datasets.get_ecommerce_dataset_encoded(number_of_documents=739, select_fields=None)#

Download an example e-commerce dataset

Total Len: 739

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_image_clip_vector_': [...],
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'product_title_clip_vector_': [...],
    'query': 'steel necklace',
    'source': 'eBay'
}
Return type

List[Dict[Any, Any]]

relevanceai.utils.datasets.get_flipkart_dataset(number_of_documents=19920, select_fields=None)#

Download an example flipkart ecommerce dataset

Total Len: 19920

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': 0,
    'product_name': "Alisha Solid Women's Cycling Shorts",
    'description': "Key Features of Alisha Solid Women's Cycling Shorts Cotton Lycra Navy, Red, Navy,Specifications of Alisha Solid Women's Cycling Shorts Shorts Details Number of Contents in Sales Package Pack of 3 Fabric Cotton Lycra Type Cycling Shorts General Details Pattern Solid Ideal For Women's Fabric Care Gentle Machine Wash in Lukewarm Water, Do Not Bleach Additional Details Style Code ALTHT_3P_21 In the Box 3 shorts",
    'retail_price': 999.0
}
Return type

List

relevanceai.utils.datasets.get_games_dataset(number_of_documents=365, select_fields=None)#

Download an example games dataset (https://www.freetogame.com/)

Total Len: 365

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    'id': 1,
    'title': 'Dauntless',
    'thumbnail': 'https://www.freetogame.com/g/1/thumbnail.jpg',
    'short_description': 'A free-to-play, co-op action RPG with gameplay similar to Monster Hunter.',
    'game_url': 'https://www.freetogame.com/open/dauntless',
    'genre': 'MMORPG',
    'platform': 'PC (Windows)',
    'publisher': 'Phoenix Labs',
    'developer': 'Phoenix Labs, Iron Galaxy',
    'release_date': '2019-05-21',
    'freetogame_profile_url': 'https://www.freetogame.com/dauntless'
}
Return type

List

relevanceai.utils.datasets.get_iris_dataset(number_of_documents=None, select_fields=None, shuffle=True)#
Return type

List[Dict]

relevanceai.utils.datasets.get_news_dataset(number_of_documents=250, select_fields=None)#

Download an example news dataset

Total Len: 250

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    'authors': 'Ruth Harris',
    'content': 'Sometimes the power of Christmas will make you do wild and wonderful things. You do not need to believe in the Holy Trinity to believe in the positive power of doing good for others.
    'domain': 'awm.com',
    'id': 141,
    'inserted_at': '2018-02-02 01:19:41.756632',
    'keywords': nan,
    'meta_description': nan,
    'meta_keywords': "['']",
    'scraped_at': '2018-01-25 16:17:44.789555',
    'summary': nan,
    'tags': nan,
    'title': 'Church Congregation Brings Gift to Waitresses Working on Christmas Eve, Has Them Crying (video)',
    'type': 'unreliable',
    'updated_at': '2018-02-02 01:19:41.756664',
    'url': 'http://awm.com/church-congregation-brings-gift-to-waitresses-working-on-christmas-eve-has-them-crying-video/'
}
Return type

List

relevanceai.utils.datasets.get_online_ecommerce_dataset(number_of_documents=1000, select_fields=None)#

Download an example ecommerce dataset (https://data.world/crowdflower/ecommerce-search-relevance)

Total Len: 15528

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_unit_id': 711158459,
    'product_description': 'The PlayStation 4 system opens the door to an '
                        'incredible journey through immersive new gaming '
                        'worlds and a deeply connected gaming community. Step '
                        'into living, breathing worlds where you are hero of '
                        '...',
    'product_image': 'http://thumbs2.ebaystatic.com/d/l225/m/mzvzEUIknaQclZ801YCY1ew.jpg',
    'product_link': 'http://www.ebay.com/itm/Sony-PlayStation-4-PS4-Latest-Model-500-GB-Jet-Black-Console-/321459436277?pt=LH_DefaultDomain_0&hash=item4ad879baf5',
    'product_price': '$329.98 ',
    'product_title': 'Sony PlayStation 4 (PS4) (Latest Model)- 500 GB Jet Black 'Console'',
    'query': 'playstation 4',
    'rank': 1,
    'relevance': 3.67,
    'relevance:variance': 0.471,
    'source': 'eBay',
    'url': 'http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0&_nkw=playstation%204'
}
Return type

List

relevanceai.utils.datasets.get_online_retail_dataset(number_of_documents=1000, select_fields=None)#

Download an example online retail dataset from UCI machine learning

Total Len: 541909

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    'Country': 'United Kingdom',
    'CustomerID': 17850.0,
    'Description': 'WHITE HANGING HEART T-LIGHT HOLDER',
    'InvoiceDate': Timestamp('2010-12-01 08:26:00'),
    'InvoiceNo': 536365,
    'Quantity': 6,
    'StockCode': '85123A',
    'UnitPrice': 2.55
}
Return type

List

relevanceai.utils.datasets.get_palmer_penguins_dataset(number_of_documents=None, select_fields=None, shuffle=True)#
Return type

List[Dict]

relevanceai.utils.datasets.get_realestate_dataset(number_of_documents=50, select_fields=None)#

Download an example real-estate dataset

Total Len: 5885

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    'propertyDetails': {'area': 'North Shore - Lower',
    'carspaces': 1,
    'streetNumber': '28',
    'latitude': -33.8115768,
    'allPropertyTypes': ['ApartmentUnitFlat'],
    'postcode': '2066',
    'unitNumber': '6',
    'bathrooms': 1.0,
    'bedrooms': 1.0,
    'features': ['BuiltInWardrobes', 'InternalLaundry','Intercom', 'Dishwasher'],
    'street': 'Epping Road',
    'propertyType': 'ApartmentUnitFlat',
    'suburb': 'LANE COVE',
    'state': 'NSW',
    'region': 'Sydney Region',
    'displayableAddress': '6/28 Epping Road, Lane Cove',
    'longitude': 151.166611},
    'listingSlug': '6-28-epping-road-lane-cove-nsw-2066-14688794',
    'id': 14688794,
    'headline': 'Extra large one bedroom unit',
    'summaryDescription': '<b></b><br />This modern and spacious one-bedroom apartment situated on the top floor, the quiet rear side of a small 2 story boutique block, enjoys a wonderfully private, leafy, and greenly outlook from 2 sides and balcony. A short stroll to city buse...',
    'advertiser': 'Ray White Lane Cove',
    'image_url': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_1_1_201203_101135-w1600-h1065',
    'insert_date_': '2021-03-01T14:19:22.805086',
    'labels': [],
    'image_url_5': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_5_1_201203_101135-w1600-h1067',
    'image_url_4': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_4_1_201203_101135-w1600-h1067',
    'priceDetails': {'displayPrice': 'Deposit Taken ! Inspection Cancelled thank you !!!'}
...
}
relevanceai.utils.datasets.get_sample_ecommerce_dataset(number_of_documents=1000, select_fields=None)#

Download an example e-commerce dataset

Total Len: 739

Parameters
  • number_of_documents (int) – Number of documents to download

  • select_fields (list) – Fields to include in the dataset, empty array/list means all fields.

Example

{
    '_id': '711160239',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'query': 'steel necklace',
    'source': 'eBay'
}
relevanceai.utils.datasets.get_titanic_dataset(output_format='json')#

Titanic Dataset.

# Sample document {‘Unnamed: 0’: 0, ‘PassengerId’: 892, ‘Survived’: 0, ‘Pclass’: 3, ‘Age’: 34.5, ‘SibSp’: 0, ‘Parch’: 0, ‘Fare’: 7.8292, ‘male’: 1, ‘Q’: 1, ‘S’: 0, ‘value_vector_’: ‘[3.0, 34.5, 0.0, 0.0, 7.8292, 1.0, 1.0, 0.0]’}

relevanceai.utils.datasets.list_example_datasets()#
relevanceai.utils.datasets.mock_documents(number_of_documents=100, vector_length=5)#

Utility function to mock documents. Aimed at helping users reproduce errors if required. The schema for the documents is as follows:

{'_chunk_': 'chunks',
'_chunk_.label': 'text',
'_chunk_.label_chunkvector_': {'chunkvector': 5},
'insert_date_': 'date',
'sample_1_description': 'text',
'sample_1_label': 'text',
'sample_1_value': 'numeric',
'sample_1_vector_': {'vector': 5},
'sample_2_description': 'text',
'sample_2_label': 'text',
'sample_2_value': 'numeric',
'sample_2_vector_': {'vector': 5},
'sample_3_description': 'text',
'sample_3_label': 'text',
'sample_3_value': 'numeric',
'sample_3_vector_': {'vector': 5}}
Parameters
  • number_of_documents (int) – The number of documents to mock

  • vector_length (int) – The length of vectors

  • code-block:: (..) – from relevanceai.package_utils.datasets import mock_documents documents = mock_documents(10)

relevanceai.utils.datasets.select_fields_from_json(json, select_fields)#