Optimizing Near-real Time Module


The scenario is we have a python script that runs on a given interval that automatically loads shapefile into PostgreSQL with PostGIS database.This shapefile is from a simulation done by a computer and it is updated on a timely basis.

Near-real Time module of the app performs INTERSECT query to generate statistics. This involves two tables on the database. Every query took more than 5 minutes which is unacceptable. No matter how I try to change(optimize) my query, it's still SLOOOOOOOOOOOOOOOOOOOOOOW!

So, instead of directly performing INTERSECT query between two tables every time the user take request. I have this instead:

I created a stats table where the results are inserted and stored on it. After loading the shapefile into the database, a PostgreSQl function will be called and perform the INTERSECT query then insert the new result to the stats table.

The database server will just perform select query to the stats table on user request which is way faster of course.

cursor = connection.cursor()
    select  gridcode, bldg_name, bldg_type, count(distinct(id))
    from stats_table
    where datetime=(select datetime from stats_table order by datetime desc limit 1) {0} {1} {2}
    group by 1,2,3;
    """.format(muni, brgy, bldg_type))

Get the latest datetime on the stats table and make it the basis to generate statistics.

Popular posts from this blog

Set table width to 100% on PDF export in Datatables

Hot/Cute flat-chested Girls