A scratchpad for ideas relating to Airbot.
Data Exploration with Datasette
Airbot has been collecting / scraping data from TCEQ for a number of years, but outside of the tweets from @Kuukihouston, the data is locked up in a Postgres database. I've been wanting to make a visual method of exploring the data for a number of years, but building out a data analysis platform is a large undertaking.
Starting with the locations that Airbot automatically tweets about could be a good starting point.
- Download Datasette and see if there's any special columns for formatting required
- Experiment with db-to-sqliteto extract the data
- Dumped data from prod and restored locally
- Converted from postgres to sqlite with the following command:
db-to-sqlite "postgresql://localhost/airbot-prod" airdata.db --table air_location --table air_gcreading --table air_readingchangelog
- Processed used 19GB of ram - could this be made less resource intensive somehow?
- Put it into a Docker container (publish to github?)
Publish Data Directly To Github
Could access to the data be improved by automatically exporting to json and comitting / pushing the data?
Github seems to allow for at least 5GB of disk space for a given repository, we should be able to collect quite a bit of data.
Setup a repo per year if it gets too large?
- All locations?
- How do you organize the data in individual files?
- location/date/<all readings> ?
- location/pollutant/<all readings for every date>?