optimization – redpanthers.co

Materialized Views: Caching database query

coderhs — Wed, 30 Nov 2016 09:22:46 +0000

What is a database view? A database view is a stored set of queries, which gets executed whenever a view is called or evoked. Unlike the regular tables, the view doesn’t occupy any physical space in your hard disk but its schema and everything is stored in the system memory. It helps abstract away the underlying tables and makes it easier to work with. They can also be called as pseudo tables. Quoted from the PostgerSQL documentation.

Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.

CREATE VIEW company_manager AS
SELECT id, name, email
FROM  companies
WHERE role='manager';

Now to access all the managers

SELECT * FROM company_managers;

Making more use of views makes your DB design much cleaner, but here we are talking more about using Materializing views. As that would lead to the more direct performance boost.

So what is a Materialized view?

The materializing view was first introduced in oracle. But now you can find it in most database systems like PostgreSQL, MicrosoftSQL server, IBM DB2, Sybase. MySQL doesn’t have native support for it, but you can find extensions for it which would help achieve this Materialized view is also called Matview. It is a form of database view that also has the result of the query as well. Which speeds up the results because now, you don’t have to run the query to get the results, as its already there, calculated. Of course, there are cases where we can’t have this, where we need more real-time information. But while generating reports you create a matview and then later refresh the matview to get the updated reports. Things to note about matview are:

It’s read-only (pseudo-table) so you can’t update it.
You need to refresh the table to get the latest data.
While refreshing, it would block other connections to access the existing data from the material view, so you need to make the refresh run concurrently

So why use Materialized views in Rails?

Capture commonly used joins & filters.
Push data intensive processing from Ruby to Database.
Allow fast and live filtering of complex associations or calculation fields.

How do you use it in Rails?

Well thanks to active record, it’s quite easy to use this in our code. But we need a bit of SQL as well. First, we add the migration to create the materialized views.

bundle exec rails g migration create_all_time_sales_mat_view

In the migration file, we add the SQL

class CreateAllTimesSalesMatView < ActiveRecord::Migration
  def up
    execute <<-SQL
      CREATE MATERIALIZED VIEW all_time_sales_mat_view AS
        SELECT sum(amount) as total_sale,
        DATE_TRUNC('day', invoice_adte) as date_of_sale
      FROM sales
      GROUP BY DATE_TRUNC('day', invoice_adte)
    SQL
  end
  def down
    execute("DROP MATERIALIZED VIEW IF EXISTS all_time_sales_view")
  end
end

Once the view is ready , we can create the model for this at app/models/all_time_sales_mat_view.rb

class AllTimeSalesMatView < ActiveRecord::Base
  self.table_name = 'all_time_sales_mat_view'
  def readonly?
    true
  end
  def self.refresh
    ActiveRecord::Base.connection.execute('REFRESH MATERIALIZED VIEW CONCURRENTLY all_time_sales_mat_view')
  end
end

Now we select and query the model as usual.

AllTimeSalesMatView.select(:date_of_sale)
AllTimeSalesMatView.sum(:total_sale)

We can’t do any create, save or update. As its a read-only table. Creating a table with a total of million sales record for every date in the last year, gave us the following speed improvement.

Regular
       user     system      total        real
     (976.4ms)  0.020000   0.000000   0.020000 (  0.990258)
MatiView
     (2.3ms)    0.000000   0.010000   0.010000 (  0.012010)

Over 10 times speed improvement, yay!!

Summarize

Good Points

Faster to fetch data.
Capture commonly used joins & filters.
Push data intensive processing from Ruby to Database.
Allow fast and live filtering of complex associations or calculation .fields.

Pain Points

To alter table we need to write SQL
We will be using more RAM and Storage
Requires Postgres 9.3 for MatView
Requires Postgres 9.4 to refresh concurrently
Can’t have Live data
- You can fix this by creating your own MatViewTable and updating it with the latest information

References

https://www.postgresql.org/docs/9.3/static/rules-materializedviews.html
http://en.wikipedia.org/wiki/Materialized_view
http://dev.mysql.com/doc/refman/5.7/en/create-view.html
https://blog.pivotal.io/labs/labs/database-views-performance-rails
https://www.sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails/

]]>

Introduction to generating JSON using PostgreSQL

coderhs — Thu, 24 Nov 2016 04:38:33 +0000

Introduction One of the major requirements for any online business is to have a backend that either provides or can be extended to provide an API response. Building websites with static HTML and simple jquery ajax is coming to an end. In this era, Javascript frameworks rules the market. Hence, it is a good decision for the database to support JSON, as JSON is becoming the glue that connects the frontend and backend. Rails have an inbuilt support for generating JSON, as it’s our swiss army knife of web development, and encourages the REST URL structure . And its a good choice for building API. It is good enough to a particular point of growth. Very soon you will reach bottlenecks, where you have more requests than you can handle and you have to either spawn up more servers or use some concurrent languages like elixir, go, etc. Before we go to that scale and burn down the existing codebase, we can use database to generate JSON responses for us, which is 10 times faster in generating JSON than Rails (though more verbose). Since PostgreSQL 9.2, the database has taken a major leap in supporting JSON. The support that PostgreSQL provides can be divided into two

Storing data in JSON and JSONB format
Generating JSON results from the query itself

In this article, we will talk about generating JSON(an introduction) from the query itself.

Getting Started

One of the advantages of using a database to generate JSON is that I have found it fast while generating smaller JSON but much more faster in generating complex JSON. (Note: The speed is in comparison with rails not with respect to the database itself)

How to generate JSON

Simplest way to do that is row_to_json() For example: Query to return user with id 1 as JSON

select row_to_json(users) from users where id = 1;

Result:

{"id":1,"email":"hsps@redpanthers.co","encrypted_password":"iwillbecrazytodisplaythat",
"reset_password_token":null,"reset_password_sent_at":null,
"remember_created_at":"2016-11-06T08:39:47.983222",
"sign_in_count":11,"current_sign_in_at":"2016-11-18T11:47:01.946542",
"last_sign_in_at":"2016-11-16T20:46:31.110257",
"current_sign_in_ip":"::1","last_sign_in_ip":"::1",
"created_at":"2016-11-06T08:38:46.193417",
"updated_at":"2016-11-18T11:47:01.956152",
"first_name":"Super","last_name":"Admin","role":3}

if you want to send only some specific fields

select row_to_json(results)
from (
  select id, email from users
) as results

Result

{"id":1,"email":"hsps@redpanthers.co"}

Now let’s see how to generate more complex JSON with sub JSON, and arrays.

select row_to_json(result)
from (
  select id, email,
    (
      select array_to_json(array_agg(row_to_json(user_projects)))
      from (
        select id, name
        from projects
        where user_id=users.id
        order by created_at asc
      ) user_projects
    ) as projects
  from users
  where id = 1
) result

This would return the JSON response

{"id":1,"email":"hsps@redpanthers.co", "project":["id": 3, "name": "CSnipp"]}

The issue with the above code is that it is more verbose (has more text) when compared to a ruby code. We need to make sure that while we do a bit of sacrifice there, is worthwhile. So while working with API’s use it only where you see a delay in JSON generation. Similarly ,to the ‘array_agg’ method that we used above to aggregate values to an array then to JSON, we aggregate them as JSON using json_agg.

array_to_json(array_agg(row_to_json(user_projects)))

can be shortened to

json_agg(user_projects)

Since the above method of array generation can be tedious, in PostgreSQL 9.4, they have introduced a new method called json_build_object. Simple usage of the function can be as below

json_build_object('foo',1,'bar',2)

which will output

{"foo": 1, "bar": 2}

Also, we can use it to build complex JSON tree by creating functions within the PostgreSQL database. Of course, as we do that, we are moving more and more logic of the code into the DB and we would need to run migrations every time when we want to update a function. So as I said before, we are sacrificing our convenience here .So we should only use this, as the complexity of our JSON generation increases. I will be covering how to write PostgreSQL functions to help generate more complex JSON structure easier in the second part of this particle.

References

https://www.postgresql.org/docs/current/static/functions-json.html http://bytefish.de/blog/postgresql_json/]]>

Different types of Index in PostgreSQL

coderhs — Mon, 19 Sep 2016 07:38:45 +0000

here. PostgreSQL uses a different set of algorithm while indexing tables, each type of algorithm is good for a certain set of data. Here we will be discussing the various algorithms available and when we should be using them. (Note these are the algorithms found in PostgreSQL 9.5)

Algorithms

B-Tree (Balance Tree), is the default algorithm used when we build indexes in Rails. It keeps a sorted copy of our column, which would be our index. So if we want to find the row of the word starting with a then as soon as the words starting with a are over. It will stop searching and return null, as the index has kept everything sorted. It is good in most cases, hence it is the default algorithm used. Hash is one of the most popular indexing algorithms. But only the equate operator works on it, thus the query planner will only use an index with a hash algorithm if we do an equal operation searching for it. Another point to note is that Hash index is not WAL (Write Ahead Log) logged, so if the database crash we can’t rebuild the index and would need to REINDEX the entire column. GIN, Generalized Inverted Indexing are great for indexing columns and expressions that contain an array, JSON, JSONB, etc. Internally, a GIN index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items and where each tuple in a leaf page contains either a pointer to a B-tree of heap pointers. GiST, Generalized Search Tree isn’t a single indexing scheme but rather an abstraction that makes it possible to implement indexing schemes for new data types by providing a balanced tree structure access method. In the past building and implementing custom indexing algorithm for custom data types include an understanding of the internals of the database. With the implementation of GiST, it provides an abstraction of the internal working which can be used to build your own indexing algorithm. It uses B-Tree internally, and thus we can use GiST to index IP address, Geo Location, etc. SP-GiST, Space Partitioned Generalized Search Tree – as the name suggest its GiST implementation itself but instead of balance tree structure we can use one of the non-balanced tree structure such as radix tree, quadtree, k-d tree. BRIN, Block Range Indexes are designed to handle very large tables in which the rows’ natural sort order correlates to certain column values. For example, a table storing log entries might have a timestamp column for when each log entry was written. By using a BRIN index on this column, scanning large parts of the table can be avoided when querying rows by their timestamp value with very little overhead. ]]>