Materialized Views: Caching database query

Materialized View Purpose[/caption] Before starting with a materialized view, let’s talk about database views.

What is a database view?

A database view is a stored set of queries, which gets executed whenever a view is called or evoked. Unlike the regular tables, the view doesn’t occupy any physical space in your hard disk but its schema and everything is stored in the system memory. It helps abstract away the underlying tables and makes it easier to work with. They can also be called as pseudo tables. Quoted from the PostgerSQL documentation.
Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.  
CREATE VIEW company_manager AS
SELECT id, name, email
FROM  companies
WHERE role='manager';
Now to access all the managers
SELECT * FROM company_managers;
  Making more use of views makes your DB design much cleaner, but here we are talking more about using Materializing views. As that would lead to the more direct performance boost.

So what is a Materialized view?

The materializing view was first introduced in oracle. But now you can find it in most database systems like PostgreSQL, MicrosoftSQL server, IBM DB2, Sybase. MySQL doesn’t have native support for it, but you can find extensions for it which would help achieve this Materialized view is also called Matview. It is a form of database view that also has the result of the query as well. Which speeds up the results because now, you don’t have to run the query to get the results, as its already there, calculated. Of course, there are cases where we can’t have this, where we need more real-time information. But while generating reports you create a matview and then later refresh the matview to get the updated reports. Things to note about matview are:
  1. It’s read-only (pseudo-table) so you can’t update it.
  2. You need to refresh the table to get the latest data.
  3. While refreshing, it would block other connections to access the existing data from the material view, so you need to make the refresh run concurrently

So why use Materialized views in Rails?

  • Capture commonly used joins & filters.
  • Push data intensive processing from Ruby to Database.
  • Allow fast and live filtering of complex associations or calculation fields.

How do you use it in Rails?

Well thanks to active record, it’s quite easy to use this in our code. But we need a bit of SQL as well. First, we add the migration to create the materialized views.
bundle exec rails g migration create_all_time_sales_mat_view
In the migration file, we add the SQL
class CreateAllTimesSalesMatView < ActiveRecord::Migration
  def up
    execute <<-SQL
      CREATE MATERIALIZED VIEW all_time_sales_mat_view AS
        SELECT sum(amount) as total_sale,
        DATE_TRUNC('day', invoice_adte) as date_of_sale
      FROM sales
      GROUP BY DATE_TRUNC('day', invoice_adte)
    SQL
  end
  def down
    execute("DROP MATERIALIZED VIEW IF EXISTS all_time_sales_view")
  end
end
Once the view is ready , we can create the model for this at app/models/all_time_sales_mat_view.rb
class AllTimeSalesMatView < ActiveRecord::Base
  self.table_name = 'all_time_sales_mat_view'
  def readonly?
    true
  end
  def self.refresh
    ActiveRecord::Base.connection.execute('REFRESH MATERIALIZED VIEW CONCURRENTLY all_time_sales_mat_view')
  end
end
Now we select and query the model as usual.
AllTimeSalesMatView.select(:date_of_sale)
AllTimeSalesMatView.sum(:total_sale)
We can’t do any create, save or update. As its a read-only table. Creating a table with a total of million sales record for every date in the last year, gave us the following speed improvement.
Regular
       user     system      total        real
     (976.4ms)  0.020000   0.000000   0.020000 (  0.990258)
MatiView
     (2.3ms)    0.000000   0.010000   0.010000 (  0.012010)
Over 10 times speed improvement, yay!!

Summarize

Good Points

  • Faster to fetch data.
  • Capture commonly used joins & filters.
  • Push data intensive processing from Ruby to Database.
  • Allow fast and live filtering of complex associations or calculation .fields.

Pain Points

  • To alter table we need to write SQL
  • We will be using more RAM and Storage
  • Requires Postgres 9.3 for MatView
  • Requires Postgres 9.4 to refresh concurrently
  • Can’t have Live data
    • You can fix this by creating your own MatViewTable and updating it with the latest information

References

  • https://www.postgresql.org/docs/9.3/static/rules-materializedviews.html
  • http://en.wikipedia.org/wiki/Materialized_view
  • http://dev.mysql.com/doc/refman/5.7/en/create-view.html
  • https://blog.pivotal.io/labs/labs/database-views-performance-rails
  • https://www.sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails/
]]>