Datasets#
Relevance AI Platform offers free datasets for users. These datasets have been licensed under Apache 2.0.
- class relevanceai.utils.datasets.ExampleDatasets#
Bases:
object
- __init__()#
- get_dataset(name, number_of_documents=None, select_fields=None)#
Download an example dataset :param name: Name of example dataset :type name: string :param number_of_documents: Number of documents to download :type number_of_documents: int :type select_fields:
Optional
[List
] :param select_fields: Fields to include in the dataset, empty array/list means all fields. :type select_fields: list
- list_datasets()#
List of example datasets available to download
- relevanceai.utils.datasets.example_documents(dataset_id, number_of_documents=None)#
- relevanceai.utils.datasets.get_coco_dataset(number_of_documents=1000, include_vector=True, select_fields=None)#
Get the coco dataset
- relevanceai.utils.datasets.get_dummy_ecommerce_dataset(number_of_documents=739, select_fields=None)#
Download an example e-commerce dataset
Total Len: 739
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9', 'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg', 'product_image_clip_vector_': [...], 'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f', 'product_price': '$7.99 to $12.99', 'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"', 'product_title_clip_vector_': [...], 'query': 'steel necklace', 'source': 'eBay' }
- Return type
List
[Dict
[Any
,Any
]]
- relevanceai.utils.datasets.get_ebay_app_review_dataset(number_of_documents=100, select_fields=None)#
Download an example playstore reviews data for ebay
Total Len: 10000
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': '4b9b92c3-011d-4f43-98ca-e131958a49f4', 'at': datetime.datetime(2022, 6, 20, 12, 22, 23), 'content': "PLEASE change the way your app works in terms of swiping through images. If you pull down, you refresh the page. If you swipe left/right, you change images. Problem is, it's far too easy to accidentally pull down while swiping left/right, which ends up resetting the gallery!!!! Please fix this! Otherwise, app works as expected.", 'repliedAt': None, 'replyContent': None, 'reviewCreatedVersion': '6.64.0.3', 'reviewId': '4b9b92c3-011d-4f43-98ca-e131958a49f4', 'score': 4.0, 'thumbsUpCount': 50, 'userImage': 'https://play-lh.googleusercontent.com/a/AATXAJwrSs35SJYs5BUzJ2blj0zJagZgUZuPfglwcT_f=mo', 'userName': 'Mitchel Wood' }
- Return type
List
- relevanceai.utils.datasets.get_ebay_app_review_encoded_dataset(number_of_documents=100, select_fields=None)#
Download an example playstore reviews data for ebay (all encoded)
Total Len: 10000
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
- Return type
List
- relevanceai.utils.datasets.get_ecommerce_1_dataset(number_of_documents=739, select_fields=None)#
Download an example e-commerce dataset
Total Len: 739
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9', 'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg', 'product_image_clip_vector_': [...], 'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f', 'product_price': '$7.99 to $12.99', 'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"', 'product_title_clip_vector_': [...], 'query': 'steel necklace', 'source': 'eBay' }
- Return type
List
[Dict
[Any
,Any
]]
- relevanceai.utils.datasets.get_ecommerce_2_dataset(number_of_documents=1000, select_fields=None)#
Download an example e-commerce dataset
Total Len: 739
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': '711160239', 'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg', 'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f', 'product_price': '$7.99 to $12.99', 'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"', 'query': 'steel necklace', 'source': 'eBay' }
- relevanceai.utils.datasets.get_ecommerce_3_dataset(number_of_documents=1000, select_fields=None)#
Download an example ecommerce dataset (https://data.world/crowdflower/ecommerce-search-relevance)
Total Len: 15528
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_unit_id': 711158459, 'product_description': 'The PlayStation 4 system opens the door to an ' 'incredible journey through immersive new gaming ' 'worlds and a deeply connected gaming community. Step ' 'into living, breathing worlds where you are hero of ' '...', 'product_image': 'http://thumbs2.ebaystatic.com/d/l225/m/mzvzEUIknaQclZ801YCY1ew.jpg', 'product_link': 'http://www.ebay.com/itm/Sony-PlayStation-4-PS4-Latest-Model-500-GB-Jet-Black-Console-/321459436277?pt=LH_DefaultDomain_0&hash=item4ad879baf5', 'product_price': '$329.98 ', 'product_title': 'Sony PlayStation 4 (PS4) (Latest Model)- 500 GB Jet Black 'Console'', 'query': 'playstation 4', 'rank': 1, 'relevance': 3.67, 'relevance:variance': 0.471, 'source': 'eBay', 'url': 'http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0&_nkw=playstation%204' }
- Return type
List
- relevanceai.utils.datasets.get_ecommerce_dataset(number_of_documents=1000, select_fields=None)#
Download an example ecommerce dataset (https://data.world/crowdflower/ecommerce-search-relevance)
Total Len: 15528
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_unit_id': 711158459, 'product_description': 'The PlayStation 4 system opens the door to an ' 'incredible journey through immersive new gaming ' 'worlds and a deeply connected gaming community. Step ' 'into living, breathing worlds where you are hero of ' '...', 'product_image': 'http://thumbs2.ebaystatic.com/d/l225/m/mzvzEUIknaQclZ801YCY1ew.jpg', 'product_link': 'http://www.ebay.com/itm/Sony-PlayStation-4-PS4-Latest-Model-500-GB-Jet-Black-Console-/321459436277?pt=LH_DefaultDomain_0&hash=item4ad879baf5', 'product_price': '$329.98 ', 'product_title': 'Sony PlayStation 4 (PS4) (Latest Model)- 500 GB Jet Black 'Console'', 'query': 'playstation 4', 'rank': 1, 'relevance': 3.67, 'relevance:variance': 0.471, 'source': 'eBay', 'url': 'http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0&_nkw=playstation%204' }
- Return type
List
- relevanceai.utils.datasets.get_ecommerce_dataset_clean(number_of_documents=1000, select_fields=None)#
Download an example e-commerce dataset
Total Len: 739
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': '711160239', 'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg', 'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f', 'product_price': '$7.99 to $12.99', 'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"', 'query': 'steel necklace', 'source': 'eBay' }
- relevanceai.utils.datasets.get_ecommerce_dataset_encoded(number_of_documents=739, select_fields=None)#
Download an example e-commerce dataset
Total Len: 739
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9', 'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg', 'product_image_clip_vector_': [...], 'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f', 'product_price': '$7.99 to $12.99', 'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"', 'product_title_clip_vector_': [...], 'query': 'steel necklace', 'source': 'eBay' }
- Return type
List
[Dict
[Any
,Any
]]
- relevanceai.utils.datasets.get_flipkart_dataset(number_of_documents=19920, select_fields=None)#
Download an example flipkart ecommerce dataset
Total Len: 19920
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': 0, 'product_name': "Alisha Solid Women's Cycling Shorts", 'description': "Key Features of Alisha Solid Women's Cycling Shorts Cotton Lycra Navy, Red, Navy,Specifications of Alisha Solid Women's Cycling Shorts Shorts Details Number of Contents in Sales Package Pack of 3 Fabric Cotton Lycra Type Cycling Shorts General Details Pattern Solid Ideal For Women's Fabric Care Gentle Machine Wash in Lukewarm Water, Do Not Bleach Additional Details Style Code ALTHT_3P_21 In the Box 3 shorts", 'retail_price': 999.0 }
- Return type
List
- relevanceai.utils.datasets.get_games_dataset(number_of_documents=365, select_fields=None)#
Download an example games dataset (https://www.freetogame.com/)
Total Len: 365
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ 'id': 1, 'title': 'Dauntless', 'thumbnail': 'https://www.freetogame.com/g/1/thumbnail.jpg', 'short_description': 'A free-to-play, co-op action RPG with gameplay similar to Monster Hunter.', 'game_url': 'https://www.freetogame.com/open/dauntless', 'genre': 'MMORPG', 'platform': 'PC (Windows)', 'publisher': 'Phoenix Labs', 'developer': 'Phoenix Labs, Iron Galaxy', 'release_date': '2019-05-21', 'freetogame_profile_url': 'https://www.freetogame.com/dauntless' }
- Return type
List
- relevanceai.utils.datasets.get_iris_dataset(number_of_documents=None, select_fields=None, shuffle=True)#
- Return type
List
[Dict
]
- relevanceai.utils.datasets.get_news_dataset(number_of_documents=250, select_fields=None)#
Download an example news dataset
Total Len: 250
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ 'authors': 'Ruth Harris', 'content': 'Sometimes the power of Christmas will make you do wild and wonderful things. You do not need to believe in the Holy Trinity to believe in the positive power of doing good for others. 'domain': 'awm.com', 'id': 141, 'inserted_at': '2018-02-02 01:19:41.756632', 'keywords': nan, 'meta_description': nan, 'meta_keywords': "['']", 'scraped_at': '2018-01-25 16:17:44.789555', 'summary': nan, 'tags': nan, 'title': 'Church Congregation Brings Gift to Waitresses Working on Christmas Eve, Has Them Crying (video)', 'type': 'unreliable', 'updated_at': '2018-02-02 01:19:41.756664', 'url': 'http://awm.com/church-congregation-brings-gift-to-waitresses-working-on-christmas-eve-has-them-crying-video/' }
- Return type
List
- relevanceai.utils.datasets.get_online_ecommerce_dataset(number_of_documents=1000, select_fields=None)#
Download an example ecommerce dataset (https://data.world/crowdflower/ecommerce-search-relevance)
Total Len: 15528
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_unit_id': 711158459, 'product_description': 'The PlayStation 4 system opens the door to an ' 'incredible journey through immersive new gaming ' 'worlds and a deeply connected gaming community. Step ' 'into living, breathing worlds where you are hero of ' '...', 'product_image': 'http://thumbs2.ebaystatic.com/d/l225/m/mzvzEUIknaQclZ801YCY1ew.jpg', 'product_link': 'http://www.ebay.com/itm/Sony-PlayStation-4-PS4-Latest-Model-500-GB-Jet-Black-Console-/321459436277?pt=LH_DefaultDomain_0&hash=item4ad879baf5', 'product_price': '$329.98 ', 'product_title': 'Sony PlayStation 4 (PS4) (Latest Model)- 500 GB Jet Black 'Console'', 'query': 'playstation 4', 'rank': 1, 'relevance': 3.67, 'relevance:variance': 0.471, 'source': 'eBay', 'url': 'http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0&_nkw=playstation%204' }
- Return type
List
- relevanceai.utils.datasets.get_online_retail_dataset(number_of_documents=1000, select_fields=None)#
Download an example online retail dataset from UCI machine learning
Total Len: 541909
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ 'Country': 'United Kingdom', 'CustomerID': 17850.0, 'Description': 'WHITE HANGING HEART T-LIGHT HOLDER', 'InvoiceDate': Timestamp('2010-12-01 08:26:00'), 'InvoiceNo': 536365, 'Quantity': 6, 'StockCode': '85123A', 'UnitPrice': 2.55 }
- Return type
List
- relevanceai.utils.datasets.get_palmer_penguins_dataset(number_of_documents=None, select_fields=None, shuffle=True)#
- Return type
List
[Dict
]
- relevanceai.utils.datasets.get_realestate_dataset(number_of_documents=50, select_fields=None)#
Download an example real-estate dataset
Total Len: 5885
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ 'propertyDetails': {'area': 'North Shore - Lower', 'carspaces': 1, 'streetNumber': '28', 'latitude': -33.8115768, 'allPropertyTypes': ['ApartmentUnitFlat'], 'postcode': '2066', 'unitNumber': '6', 'bathrooms': 1.0, 'bedrooms': 1.0, 'features': ['BuiltInWardrobes', 'InternalLaundry','Intercom', 'Dishwasher'], 'street': 'Epping Road', 'propertyType': 'ApartmentUnitFlat', 'suburb': 'LANE COVE', 'state': 'NSW', 'region': 'Sydney Region', 'displayableAddress': '6/28 Epping Road, Lane Cove', 'longitude': 151.166611}, 'listingSlug': '6-28-epping-road-lane-cove-nsw-2066-14688794', 'id': 14688794, 'headline': 'Extra large one bedroom unit', 'summaryDescription': '<b></b><br />This modern and spacious one-bedroom apartment situated on the top floor, the quiet rear side of a small 2 story boutique block, enjoys a wonderfully private, leafy, and greenly outlook from 2 sides and balcony. A short stroll to city buse...', 'advertiser': 'Ray White Lane Cove', 'image_url': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_1_1_201203_101135-w1600-h1065', 'insert_date_': '2021-03-01T14:19:22.805086', 'labels': [], 'image_url_5': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_5_1_201203_101135-w1600-h1067', 'image_url_4': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_4_1_201203_101135-w1600-h1067', 'priceDetails': {'displayPrice': 'Deposit Taken ! Inspection Cancelled thank you !!!'} ... }
- relevanceai.utils.datasets.get_sample_ecommerce_dataset(number_of_documents=1000, select_fields=None)#
Download an example e-commerce dataset
Total Len: 739
- Parameters
number_of_documents (int) – Number of documents to download
select_fields (list) – Fields to include in the dataset, empty array/list means all fields.
Example
{ '_id': '711160239', 'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg', 'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f', 'product_price': '$7.99 to $12.99', 'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"', 'query': 'steel necklace', 'source': 'eBay' }
- relevanceai.utils.datasets.get_titanic_dataset(output_format='json')#
Titanic Dataset.
# Sample document {‘Unnamed: 0’: 0, ‘PassengerId’: 892, ‘Survived’: 0, ‘Pclass’: 3, ‘Age’: 34.5, ‘SibSp’: 0, ‘Parch’: 0, ‘Fare’: 7.8292, ‘male’: 1, ‘Q’: 1, ‘S’: 0, ‘value_vector_’: ‘[3.0, 34.5, 0.0, 0.0, 7.8292, 1.0, 1.0, 0.0]’}
- relevanceai.utils.datasets.list_example_datasets()#
- relevanceai.utils.datasets.mock_documents(number_of_documents=100, vector_length=5)#
Utility function to mock documents. Aimed at helping users reproduce errors if required. The schema for the documents is as follows:
{'_chunk_': 'chunks', '_chunk_.label': 'text', '_chunk_.label_chunkvector_': {'chunkvector': 5}, 'insert_date_': 'date', 'sample_1_description': 'text', 'sample_1_label': 'text', 'sample_1_value': 'numeric', 'sample_1_vector_': {'vector': 5}, 'sample_2_description': 'text', 'sample_2_label': 'text', 'sample_2_value': 'numeric', 'sample_2_vector_': {'vector': 5}, 'sample_3_description': 'text', 'sample_3_label': 'text', 'sample_3_value': 'numeric', 'sample_3_vector_': {'vector': 5}}
- Parameters
number_of_documents (int) – The number of documents to mock
vector_length (int) – The length of vectors
code-block:: (..) – from relevanceai.package_utils.datasets import mock_documents documents = mock_documents(10)
- relevanceai.utils.datasets.select_fields_from_json(json, select_fields)#