๐๏ธ Aggregation QuickStart#
Installation#
# remove `!` if running the line in a terminal
!pip install -U RelevanceAI[notebook]==2.0.0
Setup#
You can sign up/login and find your credentials here:
https://cloud.tryrelevance.com/sdk/api Once you have signed up, click on the
value under Activation token
and paste it here
from relevanceai import Client
client = Client()
Activation token (you can find it here: https://cloud.tryrelevance.com/sdk/api )
Activation Token: ยทยทยทยทยทยทยทยทยทยท
Connecting to us-east-1...
You can view all your datasets at https://cloud.tryrelevance.com/datasets/
Welcome to RelevanceAI. Logged in as 334fe5fb667b3a64dada.
Data#
import pandas as pd
from relevanceai.utils.datasets import get_realestate_dataset
# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_realestate_dataset()
# ToDo: Remove this cell when the dataset is updated
for d in documents:
if "_clusters_" in d:
del d["_clusters_"]
pd.DataFrame.from_dict(documents).head()
image_url_4_vector_ | hasFloorplan | image_url_vector_ | listingType | image_url_2_vector_ | image_url_2 | propertyDetails | listingSlug | id | headline | ... | image_url_5_clip_vector_ | image_url_2_label | image_url_4_label | image_url_2_clip_vector_ | image_url_4_clip_vector_ | image_url_5_label | image_url_clip_vector_ | image_url_label | _cluster_ | _id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | False | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | Rent | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | https://bucket-api.domain.com.au/v1/bucket/ima... | {'area': 'Eastern Suburbs', 'carspaces': 2, 's... | 407-39-kent-street-mascot-nsw-2020-14806988 | 14806988 | Stunning & Modern Two Bedroom Apartment | ... | [-0.4681514799594879, 0.08181382715702057, 0.1... | hoosegow | clubrooms | [-0.4723101556301117, 0.012517078779637814, -0... | [-0.6319758296012878, 0.1783788651227951, 0.13... | mudrooms | [-0.37417566776275635, 0.05725931376218796, -0... | showrooms | {'image_url_vector_': {'default': 0}, 'image_t... | -0JggHcBgSy8FC2yCzRU |
1 | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | False | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | Rent | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | https://bucket-api.domain.com.au/v1/bucket/ima... | {'area': 'Eastern Suburbs', 'streetNumber': '2... | 2-256-new-south-head-double-bay-nsw-2028-14816127 | 14816127 | Two Bedrooms Apartments just newly renovated | ... | [-0.4457785189151764, 0.14002937078475952, -0.... | viewings | mudroom | [-0.37797173857688904, 0.04217493161559105, -0... | [-0.6865466833114624, 0.19351454079151154, 0.1... | mudroom | [-0.5267254114151001, 0.22717250883579254, -0.... | appartements | {'image_url_vector_': {'default': 0}, 'image_t... | -0JggHcBgSy8FC2yCzVU |
2 | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | True | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | Rent | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | https://bucket-api.domain.com.au/v1/bucket/ima... | {'area': 'Eastern Suburbs', 'streetNumber': '1... | 19-11-21-flinders-street-surry-hills-nsw-2010-... | 14842628 | Iconic lifestyle pad in Urbis building | ... | [-0.06582163274288177, 0.10252979397773743, 0.... | appartements | backsplash | [0.060137778520584106, 0.31164053082466125, 0.... | [-0.20558945834636688, 0.6132649183273315, 0.0... | serigraph | [-0.2266240119934082, 0.3205014765262604, 0.19... | appartements | {'image_url_vector_': {'default': 0}, 'image_t... | -0JggHcBgSy8FC2ykDbk |
3 | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | False | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | Rent | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | https://bucket-api.domain.com.au/v1/bucket/ima... | {'area': 'Inner West', 'streetNumber': '13', '... | 13-formosa-st-drummoyne-nsw-2047-14828984 | 14828984 | Heritage Semi to rent | ... | [-0.334237277507782, 0.140365868806839, -0.236... | kitchen | entryway | [-0.32477402687072754, 0.4767194986343384, 0.1... | [0.12064582854509354, 0.3271999657154083, -0.2... | appartements | [-0.11818409711122513, 0.09542372077703476, -0... | pub | {'image_url_vector_': {'default': 0}, 'image_t... | -0JggHcBgSy8FC2ykDfk |
4 | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | False | [0.0394604466855526, 0, 5.5613274574279785, 0.... | Rent | [0.24612084031105042, 0.347802996635437, 0.574... | https://bucket-api.domain.com.au/v1/bucket/ima... | {'area': 'St George', 'carspaces': 1, 'streetN... | 103-11-17-woodville-street-hurstville-nsw-2220... | 14741619 | UNIQUE APARTMENT IN PRIME LOCATION | ... | [-0.3391430079936981, 0.024984989315271378, -0... | kitchen | sideman | [-0.3949810862541199, 0.3241899311542511, -0.1... | [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... | vitrine | [-0.28189733624458313, 0.061684366315603256, -... | cornlofts | {'image_url_vector_': {'default': 5}, 'image_t... | -0JhgHcBgSy8FC2y9TjX |
5 rows ร 34 columns
ds = client.Dataset("quickstart_aggregation")
ds.insert_documents(documents)
while inserting, you can visit your dashboard at https://cloud.tryrelevance.com/dataset/quickstart_aggregation/dashboard/monitor/
โ
All documents inserted/edited successfully.
1. Grouping the Data#
In general, the group-by field is structured as
{"name": ALIAS,
"field": FIELD,
"agg": TYPE-OF-GROUP}
Categorical Data#
location_group = {
"name": "location",
"field": "propertyDetails.area",
"agg": "category",
}
Numerical Data#
bedrooms_group = {
"name": "bedrooms",
"field": "propertyDetails.bedrooms",
"agg": "numeric",
}
Putting it Together#
groupby = [location_group, bedrooms_group]
2. Creating Aggregation Metrics#
In general, the aggregation field is structured as
{"name": ALIAS,
"field": FIELD,
"agg": TYPE-OF-AGG}
Average, Minimum and Maximum#
avg_price_metric = {"name": "avg_price", "field": "priceDetails.price", "agg": "avg"}
max_price_metric = {"name": "max_price", "field": "priceDetails.price", "agg": "max"}
min_price_metric = {"name": "min_price", "field": "priceDetails.price", "agg": "min"}
Sum#
sum_bathroom_metric = {
"name": "bathroom_sum",
"field": "propertyDetails.bathrooms",
"agg": "sum",
}
Putting it Together#
metrics = [avg_price_metric, max_price_metric, min_price_metric, sum_bathroom_metric]
3. Combining Grouping and Aggregating#
results = ds.aggregate(metrics=metrics, groupby=groupby)
from jsonshower import show_json
show_json(results, text_fields=list(results["results"][0].keys()))
frequency | location | bedrooms | avg_price | max_price | min_price | bathroom_sum | |
---|---|---|---|---|---|---|---|
0 | 10 | Eastern Suburbs | 2 | 670.000000 | 780.0 | 580.0 | 17 |
1 | 8 | Eastern Suburbs | 1 | 554.000000 | 670.0 | 450.0 | 8 |
2 | 3 | Eastern Suburbs | 3 | 850.000000 | 900.0 | 800.0 | 5 |
3 | 9 | North Shore - Lower | 1 | 516.666667 | 600.0 | 450.0 | 9 |
4 | 7 | North Shore - Lower | 2 | 525.000000 | 525.0 | 525.0 | 9 |
5 | 2 | North Shore - Lower | 3 | 900.000000 | 900.0 | 900.0 | 4 |
6 | 8 | Inner West | 2 | 447.500000 | 495.0 | 400.0 | 11 |
7 | 4 | Inner West | 1 | NaN | NaN | NaN | 4 |
8 | 3 | Inner West | 3 | 1070.000000 | 1070.0 | 1070.0 | 7 |
9 | 1 | Inner West | 4 | NaN | NaN | NaN | 1 |
10 | 5 | Northern Suburbs | 1 | 460.000000 | 500.0 | 420.0 | 5 |
11 | 5 | Northern Suburbs | 2 | NaN | NaN | NaN | 8 |
12 | 3 | Northern Suburbs | 3 | 620.000000 | 680.0 | 560.0 | 6 |
13 | 1 | Northern Suburbs | 4 | NaN | NaN | NaN | 1 |
14 | 4 | St George | 2 | 370.000000 | 370.0 | 370.0 | 5 |
15 | 2 | St George | 1 | 340.000000 | 350.0 | 330.0 | 2 |
16 | 2 | St George | 3 | 640.000000 | 700.0 | 580.0 | 4 |
17 | 2 | St George | 4 | 700.000000 | 700.0 | 700.0 | 4 |
18 | 4 | Sydney City | 2 | NaN | NaN | NaN | 6 |
19 | 3 | Sydney City | 1 | NaN | NaN | NaN | 3 |
20 | 1 | Sydney City | 3 | NaN | NaN | NaN | 2 |
21 | 3 | Parramatta | 2 | 450.000000 | 450.0 | 450.0 | 5 |
22 | 1 | Parramatta | 1 | 430.000000 | 430.0 | 430.0 | 1 |
23 | 3 | Canterbury/Bankstown | 2 | 300.000000 | 300.0 | 300.0 | 3 |
24 | 1 | Hills | 4 | NaN | NaN | NaN | 2 |
25 | 1 | Northern Beaches | 3 | NaN | NaN | NaN | 2 |
26 | 1 | Western Sydney | 2 | NaN | NaN | NaN | 2 |