Gathering the data
All the data gathered so far comes from Idealista.com. The whole repository is available here GitHub repo.
Raspberry Pi
- Scraper set up on a Raspberry Pi 3B
- It was being triggered by a CRONJOB every day
- Python script creates JSON with most interesting flats to showcase on the website under "Flats" tab
Scraping
Idealista API permits to request 2000 listings per month. You have to request a key here to be granted developer access
class BasicParams:
def __init__(self):
self.locationId = "0-EU-ES-46"
self.propertyType = "homes"
self.order = "publicationDate"
self.locale = "es"
I am looking for an apartment in Valencia ("0-EU-ES-46"). Scraper runs each day in the morning
source .venv/bin/activate
now=$(date +"%m_%d_%Y")
.venv/bin/python3.9 src/db_mongo.py --pages 20 &> "logs/log_inside_${now}"
I am logging the scraper results to check if the run was completed succesfully and see the statistics at the end.
Scraped flats: 380
Properties for sale
Inserted: 267, 113 found already in the database. Price changes: 12
A Python script runs each day to pick listings with the highest numer of price changes or the ones for which the price got lower most recently.
{
"propertyCode": "100002920",
"prices": [
410000,
405000,
395000,
394900,
394800
],
"dates": [
"2023-01-04",
"2023-02-01",
"2023-03-31",
"2023-09-07",
"2023-09-13"
]
}, ...
MongoDB
I store data in MongoDB. I chose NoSQL for it's flexibility and scalability. There is no schema and some listings might have additional features that others don't. If data changes drastially, I won't have to perform a complex migration. If I decide one day to store pictures, I can just store them alongside the rest of my data (MongoDB allows for variety of data types).
Another reason was that it is a popular tool and as I have worked with mostly SQL before, I wanted to try and learn something new.