Airbot ideas

From James Van Dyne
Jump to navigation Jump to search

A scratchpad for ideas relating to Airbot.

2020-11-14

Data Exploration with Datasette

Airbot has been collecting / scraping data from TCEQ for a number of years, but outside of the tweets from @Kuukihouston, the data is locked up in a Postgres database. I've been wanting to make a visual method of exploring the data for a number of years, but building out a data analysis platform is a large undertaking.

inspired by this postabout data warehouses, I wonder if I couldn't export a part of the dataset to sqlite and setup a Datasetteto explore the data.

Starting with the locations that Airbot automatically tweets about could be a good starting point.

Todos:

  • Download Datasette and see if there's any special columns for formatting required
  • Experiment with db-to-sqliteto extract the data
    • 2020-11-23
      • Dumped data from prod and restored locally
      • Converted from postgres to sqlite with the following command: db-to-sqlite "postgresql://localhost/airbot-prod" airdata.db --table air_location --table air_gcreading --table air_readingchangelog
        • Processed used 19GB of ram - could this be made less resource intensive somehow?
  • Put it into a Docker container (publish to github?)
Publish Data Directly To Github

Could access to the data be improved by automatically exporting to json and comitting / pushing the data?

Github seems to allow for at least 5GB of disk space for a given repository, we should be able to collect quite a bit of data.

Setup a repo per year if it gets too large?

Unknowns:

  • All locations?
  • How do you organize the data in individual files?
    • location/date/<all readings> ?
    • location/pollutant/<all readings for every date>?