๐Ÿ˜๏ธ Aggregation QuickStart#

Open In Colab

Installation#

# remove `!` if running the line in a terminal
!pip install -U RelevanceAI[notebook]==2.0.0

Setup#

You can sign up/login and find your credentials here: https://cloud.tryrelevance.com/sdk/api Once you have signed up, click on the value under Activation token and paste it here

from relevanceai import Client

client = Client()
Activation token (you can find it here: https://cloud.tryrelevance.com/sdk/api )

Activation Token: ยทยทยทยทยทยทยทยทยทยท
Connecting to us-east-1...
You can view all your datasets at https://cloud.tryrelevance.com/datasets/
Welcome to RelevanceAI. Logged in as 334fe5fb667b3a64dada.

Data#

import pandas as pd
from relevanceai.utils.datasets import get_realestate_dataset

# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_realestate_dataset()

# ToDo: Remove this cell when the dataset is updated

for d in documents:
    if "_clusters_" in d:
        del d["_clusters_"]

pd.DataFrame.from_dict(documents).head()
image_url_4_vector_ hasFloorplan image_url_vector_ listingType image_url_2_vector_ image_url_2 propertyDetails listingSlug id headline ... image_url_5_clip_vector_ image_url_2_label image_url_4_label image_url_2_clip_vector_ image_url_4_clip_vector_ image_url_5_label image_url_clip_vector_ image_url_label _cluster_ _id
0 [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... False [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... Rent [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... https://bucket-api.domain.com.au/v1/bucket/ima... {'area': 'Eastern Suburbs', 'carspaces': 2, 's... 407-39-kent-street-mascot-nsw-2020-14806988 14806988 Stunning & Modern Two Bedroom Apartment ... [-0.4681514799594879, 0.08181382715702057, 0.1... hoosegow clubrooms [-0.4723101556301117, 0.012517078779637814, -0... [-0.6319758296012878, 0.1783788651227951, 0.13... mudrooms [-0.37417566776275635, 0.05725931376218796, -0... showrooms {'image_url_vector_': {'default': 0}, 'image_t... -0JggHcBgSy8FC2yCzRU
1 [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... False [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... Rent [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... https://bucket-api.domain.com.au/v1/bucket/ima... {'area': 'Eastern Suburbs', 'streetNumber': '2... 2-256-new-south-head-double-bay-nsw-2028-14816127 14816127 Two Bedrooms Apartments just newly renovated ... [-0.4457785189151764, 0.14002937078475952, -0.... viewings mudroom [-0.37797173857688904, 0.04217493161559105, -0... [-0.6865466833114624, 0.19351454079151154, 0.1... mudroom [-0.5267254114151001, 0.22717250883579254, -0.... appartements {'image_url_vector_': {'default': 0}, 'image_t... -0JggHcBgSy8FC2yCzVU
2 [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... True [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... Rent [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... https://bucket-api.domain.com.au/v1/bucket/ima... {'area': 'Eastern Suburbs', 'streetNumber': '1... 19-11-21-flinders-street-surry-hills-nsw-2010-... 14842628 Iconic lifestyle pad in Urbis building ... [-0.06582163274288177, 0.10252979397773743, 0.... appartements backsplash [0.060137778520584106, 0.31164053082466125, 0.... [-0.20558945834636688, 0.6132649183273315, 0.0... serigraph [-0.2266240119934082, 0.3205014765262604, 0.19... appartements {'image_url_vector_': {'default': 0}, 'image_t... -0JggHcBgSy8FC2ykDbk
3 [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... False [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... Rent [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... https://bucket-api.domain.com.au/v1/bucket/ima... {'area': 'Inner West', 'streetNumber': '13', '... 13-formosa-st-drummoyne-nsw-2047-14828984 14828984 Heritage Semi to rent ... [-0.334237277507782, 0.140365868806839, -0.236... kitchen entryway [-0.32477402687072754, 0.4767194986343384, 0.1... [0.12064582854509354, 0.3271999657154083, -0.2... appartements [-0.11818409711122513, 0.09542372077703476, -0... pub {'image_url_vector_': {'default': 0}, 'image_t... -0JggHcBgSy8FC2ykDfk
4 [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... False [0.0394604466855526, 0, 5.5613274574279785, 0.... Rent [0.24612084031105042, 0.347802996635437, 0.574... https://bucket-api.domain.com.au/v1/bucket/ima... {'area': 'St George', 'carspaces': 1, 'streetN... 103-11-17-woodville-street-hurstville-nsw-2220... 14741619 UNIQUE APARTMENT IN PRIME LOCATION ... [-0.3391430079936981, 0.024984989315271378, -0... kitchen sideman [-0.3949810862541199, 0.3241899311542511, -0.1... [1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-07, 1e-... vitrine [-0.28189733624458313, 0.061684366315603256, -... cornlofts {'image_url_vector_': {'default': 5}, 'image_t... -0JhgHcBgSy8FC2y9TjX

5 rows ร— 34 columns

ds = client.Dataset("quickstart_aggregation")
ds.insert_documents(documents)
while inserting, you can visit your dashboard at https://cloud.tryrelevance.com/dataset/quickstart_aggregation/dashboard/monitor/
โœ… All documents inserted/edited successfully.

1. Grouping the Data#

In general, the group-by field is structured as

{"name": ALIAS,
"field": FIELD,
"agg": TYPE-OF-GROUP}

Categorical Data#

location_group = {
    "name": "location",
    "field": "propertyDetails.area",
    "agg": "category",
}

Numerical Data#

bedrooms_group = {
    "name": "bedrooms",
    "field": "propertyDetails.bedrooms",
    "agg": "numeric",
}

Putting it Together#

groupby = [location_group, bedrooms_group]

2. Creating Aggregation Metrics#

In general, the aggregation field is structured as

{"name": ALIAS,
"field": FIELD,
"agg": TYPE-OF-AGG}

Average, Minimum and Maximum#

avg_price_metric = {"name": "avg_price", "field": "priceDetails.price", "agg": "avg"}
max_price_metric = {"name": "max_price", "field": "priceDetails.price", "agg": "max"}
min_price_metric = {"name": "min_price", "field": "priceDetails.price", "agg": "min"}

Sum#

sum_bathroom_metric = {
    "name": "bathroom_sum",
    "field": "propertyDetails.bathrooms",
    "agg": "sum",
}

Putting it Together#

metrics = [avg_price_metric, max_price_metric, min_price_metric, sum_bathroom_metric]

3. Combining Grouping and Aggregating#

results = ds.aggregate(metrics=metrics, groupby=groupby)
from jsonshower import show_json

show_json(results, text_fields=list(results["results"][0].keys()))
frequency location bedrooms avg_price max_price min_price bathroom_sum
0 10 Eastern Suburbs 2 670.000000 780.0 580.0 17
1 8 Eastern Suburbs 1 554.000000 670.0 450.0 8
2 3 Eastern Suburbs 3 850.000000 900.0 800.0 5
3 9 North Shore - Lower 1 516.666667 600.0 450.0 9
4 7 North Shore - Lower 2 525.000000 525.0 525.0 9
5 2 North Shore - Lower 3 900.000000 900.0 900.0 4
6 8 Inner West 2 447.500000 495.0 400.0 11
7 4 Inner West 1 NaN NaN NaN 4
8 3 Inner West 3 1070.000000 1070.0 1070.0 7
9 1 Inner West 4 NaN NaN NaN 1
10 5 Northern Suburbs 1 460.000000 500.0 420.0 5
11 5 Northern Suburbs 2 NaN NaN NaN 8
12 3 Northern Suburbs 3 620.000000 680.0 560.0 6
13 1 Northern Suburbs 4 NaN NaN NaN 1
14 4 St George 2 370.000000 370.0 370.0 5
15 2 St George 1 340.000000 350.0 330.0 2
16 2 St George 3 640.000000 700.0 580.0 4
17 2 St George 4 700.000000 700.0 700.0 4
18 4 Sydney City 2 NaN NaN NaN 6
19 3 Sydney City 1 NaN NaN NaN 3
20 1 Sydney City 3 NaN NaN NaN 2
21 3 Parramatta 2 450.000000 450.0 450.0 5
22 1 Parramatta 1 430.000000 430.0 430.0 1
23 3 Canterbury/Bankstown 2 300.000000 300.0 300.0 3
24 1 Hills 4 NaN NaN NaN 2
25 1 Northern Beaches 3 NaN NaN NaN 2
26 1 Western Sydney 2 NaN NaN NaN 2