<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hash &#8211; redpanthers.co</title>
	<atom:link href="/tag/hash/feed/" rel="self" type="application/rss+xml" />
	<link>/</link>
	<description>Red Panthers - Experts in Ruby on Rails, System Design and Vue.js</description>
	<lastBuildDate>Tue, 31 Jan 2017 07:32:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.2.7</generator>
	<item>
		<title>Behind the scenes of hash table performance in ruby 2.4</title>
		<link>/behind-scenes-hash-table-performance-ruby-2-4/</link>
				<comments>/behind-scenes-hash-table-performance-ruby-2-4/#comments</comments>
				<pubDate>Tue, 31 Jan 2017 07:32:26 +0000</pubDate>
		<dc:creator><![CDATA[aboobacker]]></dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Ruby 2.4]]></category>
		<category><![CDATA[Hash]]></category>
		<category><![CDATA[hashtables]]></category>

		<guid isPermaLink="false">https://redpanthers.co/?p=922</guid>
				<description><![CDATA[
				<![CDATA[]]>		]]></description>
								<content:encoded><![CDATA[<p>				<![CDATA[<strong>Ruby 2.4</strong> got released this Christmas with a lot of exciting features. One of the most underrated features in ruby 2.4 is hash table improvements. Before going into details about implementation, let&#8217;s first check the benchmark to know how this change gonna affect your ruby application.
Some benchmarks are:


<h4 id="gettingkeysandvaluesofahash">Getting keys and values of a hash</h4>




<pre><code class="ruby">h = {}
10000.times do |i|
  h[i] = nil
end
# Get all hash values
Benchmark.measure { 50000.times { h.values } }
# Get all hash keys
Benchmark.measure { 50000.times { h.keys } }
</code></pre>


Output
Ruby 2.3.3


<pre class="lang:ruby decode:true"> =&gt; #&lt;Benchmark::Tms:0x00000002a0f990 @label="", @real=2.8023432340005456, @cstime=0.0, @cutime=0.0, @stime=0.13000000000000012, @utime=2.6799999999999997, @total=2.8099999999999996&gt;
#&lt;Benchmark::Tms:0x00000002963398 @label="", @real=2.767410662000657, @cstime=0.0, @cutime=0.0, @stime=0.029999999999999805, @utime=2.729999999999997, @total=2.7599999999999967&gt;</pre>


&nbsp;
ruby 2.4.0


<pre class="lang:ruby decode:true">#&lt;Benchmark::Tms:0x0000000226d700 @label="", @real=0.8854832770002758, @cstime=0.0, @cutime=0.0, @stime=0.08999999999999997, @utime=0.7999999999999998, @total=0.8899999999999998&gt;
#&lt;Benchmark::Tms:0x000000022b1018 @label="", @real=0.8476084579997405, @cstime=0.0, @cutime=0.0, @stime=0.06999999999999995, @utime=0.7799999999999994, @total=0.8499999999999993&gt;</pre>


Yeah, the above two operations executed <strong>~ 3 times</strong> faster on my laptop. Though these numbers can vary with your machine and processor, the performance improvements will be significant on all modern processors. Not all operations became 3 times faster , average perfomence improvement is more than<strong> 50%</strong>
If you are a ruby developer and excited to know what are the new features in ruby 2.4, then this feature gonna make your application faster and you don&#8217;t have to change anything in the code for that. Because these changes are <strong>backward compatible</strong>. If you are curious to know what happened behind the scenes of this performance boost, then continue reading.


<h2 id="hashtable">Hash Table</h2>




<blockquote>In computing, hash table (hash map) is a data structure that is used to implement an associative array, a structure that can map keys to values. Hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. <a href="https://en.wikipedia.org/wiki/Hash_table">Wikipedia</a></blockquote>


In other words, it is a data structure that allows you to store key value pair and helps to fetch specific data using the key in an efficient way. Unlike arrays, you don&#8217;t have to iterate through all elements to find a given element in the hash. If you are new to this data structure, check this <a href="https://www.youtube.com/watch?v=h2d9b_nEzoA">cs50 video</a> for a better understanding.


<pre class="lang:ruby decode:true">hash = {key1: value1, key2: value2}</pre>


That is cool right! Now, let&#8217;s see how this is made possible in ruby.


<h3 id="pre24">Pre 2.4</h3>


Let&#8217;s first check how ruby implemented Hash in pre 2.4 <img class="alignnone" src="http://i.imgur.com/zGSViFd.png" alt="old hash representation , hash table" width="703" height="479" />
Ruby internally uses a structure called st_<em>table to represent hash. st_</em>table contains type, the number of bins, the number of entries and pointers to bin array. Bin array is an array with 11 default size and can grow when required. Let&#8217;s take an example of hash and see how it will be represented using above diagram.


<pre><code class="ruby">sample_hash = {a: 10, b: 20, c: 30, d: 40, e: 50}
</code></pre>


Let&#8217;s take keys :c and :d
<strong>Step 1:</strong>
First thing ruby does is it will take the hash of the key using the internal hash function.


<pre class="lang:default decode:true">2.3.1 :075 &gt; :c.hash
=&gt; 2782
2.3.1 :076 &gt; :d.hash
=&gt; 2432</pre>


<strong>Step 2:</strong> After getting the hash value, result ?it  takes modulo of 11 to get figure in which bin ruby can store the given pair


<pre class="lang:default decode:true">2.3.1 :073 &gt; :c.hash % 11
=&gt; 10
2.3.1 :074 &gt; :d.hash % 11
=&gt; 1</pre>


This means we can put :c =&gt; 30 in 10&#8217;th bin and :d in 1st bin
<strong>Step 3</strong>
What if there are multiple modulo operations that give the result? This is called hash collision. To resolve this, ruby uses a separate chaining approach  i.e, it will make a doubly linked list and add it to the existing value in the bin.
<strong>Step 4</strong>
What if the hash is too large ?? Linked list will start growing and will make the hash slower. So, ruby will allocate more bins and do an operation called <strong>Rehashing</strong> to utilise newly added bins.
<span id="wmd-input-section-15489" class="wmd-input-section"><span class="token p"><span class="token strong"><strong>Improvements in 2.0</strong></span></span></span><span id="wmd-input-section-15489" class="wmd-input-section"></span>
<span id="wmd-input-section-15489" class="wmd-input-section"><span class="token p">In ruby 2.0, Ruby eliminated the need for extra data structures for smaller hashes and uses linear search for better performance.</span></span><span id="wmd-input-section-15489" class="wmd-input-section"></span>
<strong><span id="wmd-input-section-15489" class="wmd-input-section"><span class="token p"><span class="token strong">Improvements in 2.2</span></span></span></strong>
<span id="wmd-input-section-15489" class="wmd-input-section"></span><span id="wmd-input-section-15489" class="wmd-input-section"><span class="token p">In 2.2.0, Ruby has used bin array sizes that correspond to powers of two (16, 32, 64, 128, &#8230;).</span> </span>


<h3 id="changesin24">Changes in 2.4</h3>


[caption id="attachment_1629" align="aligncenter" width="300"]<a href="https://redpanthers.co/wp-content/uploads/2016/12/hash.png"><img class="wp-image-1629 size-medium" src="https://redpanthers.co/wp-content/uploads/2016/12/hash-300x201.png" alt="new hash structure in ruby 2.4 , hash table" width="300" height="201" /></a> Source: https://github.com/ruby/ruby/blob/trunk/st.c[/caption]
In ruby 2.4, Hash table is moved to <a href="https://en.wikipedia.org/wiki/Open_addressing">open addressing model</a> i.e, we no longer have the separate doubly linked list for collision resolution. Here, we will be storing records in the entries array itself i.e, there is no need of pointer chasing and data will be stored in the adjacent memory location (Data locality). The hash table has two arrays called bins and entries. Entry array contains hash entries in the order of insertion and the bin array provides access to entry the by their keys. The key hash is mapped to a bin containing the index of the corresponding entry in the entry array.
&nbsp;
<span id="wmd-input-section-13012" class="wmd-input-section"><span class="token p"><span class="token strong"><strong>Inserting entries in Hash</strong></span></span></span><span id="wmd-input-section-13012" class="wmd-input-section"></span>
<span id="wmd-input-section-13012" class="wmd-input-section"><span class="token p">Step 1:</span></span>
<span id="wmd-input-section-13012" class="wmd-input-section"><span class="token p">Ruby will insert an entry to entries array in sequential order.</span></span><span id="wmd-input-section-13012" class="wmd-input-section"></span>
<span id="wmd-input-section-13012" class="wmd-input-section"><span class="token p">Step 2:</span></span><span id="wmd-input-section-13012" class="wmd-input-section"></span>
<span id="wmd-input-section-13012" class="wmd-input-section"><span class="token p">Ruby will identify the bin from which the entry is to be mapped. Ruby uses first few bits of the hash as the bin index, Explaining the whole process is beyond the scope of this article. You can check the logic in MRI source code <a href="https://github.com/ruby/ruby/blob/trunk/st.c#L971"><span class="token link"><span class="token md md-bracket-start">here</span></span></a></span></span><span id="wmd-input-section-13012" class="wmd-input-section"></span>
<span id="wmd-input-section-13012" class="wmd-input-section"><span class="token p"><span class="token strong"><strong>Accessing element by key</strong></span> </span></span>
Let&#8217;s examine it with a sample hash **


<pre><code class="ruby">sample_hash = {a: 10, b: 20, c: 30, d: 40, e: 50}
</code></pre>


Here, ruby will create two internal arrays, entries and bins as shown below


<pre><code class="ruby">entries = [[2297,a,10], [451,b,20], [2782,c,30], [2432,d,40],[1896,e,50]]
</code></pre>


each record in entries array will contain a hash, key, and value respectively
Default bin size in ruby is 16 so Bins array for the above hash will be somewhat like this


<pre><code>bins = [
3,
nil,
nil,
nil,
1,
nil,
nil,
nil,
5,
0,
nil,
nil,
nil,
nil,
2,
nil
]
</code></pre>


Now what if we want to fetch an element from hash, say &#8216;c&#8217;


<pre><code class="ruby">sample_hash[:c]
</code></pre>


Step 1:
Find the hash using ruby&#8217;s internal hash function


<pre><code>:c.hash
2782
</code></pre>


Step 2
Find the location in bin array by using <a href="https://github.com/ruby/ruby/blob/trunk/st.c#L971">find bin method</a>


<pre class="lang:ruby decode:true">find_bin(2782)</pre>


Step 3
Find the location entries array using bin array


<pre class="lang:ruby decode:true">bins[14] =&gt; 2</pre>


&nbsp;
Step 4. Find the entry using the key we got


<pre class="lang:ruby decode:true">entries[2] =&gt; [2782,c,30]</pre>


&nbsp;
Now we have value for the key &#8216;c&#8217; without iterating through all the records


<h4 id="iclassicontrashideletinganitem"><i class="icon-trash"></i> Deleting an item</h4>


If the item to be deleted is first one, then ruby will change the index of &#8216;current first entry &#8216;, otherwise ruby will mark the item as reserved using a reserved hash value.
In the ruby source code, DELETED is marked using 0 and EMPTY is marked using 1.
To <strong>summarise</strong> this approach made hash implementation in ruby faster because the new bins array stores much smaller reference to the entries instead of storing entries themselves. Hence, it can take advantage of the modern processor caching levels
** Small hashes will use the linear search to find entries from ruby 2.0 onwards to avoid extra overhead and improve performance. Given example is just for reference only.
References


<ol>
 	

<li><a href="https://bugs.ruby-lang.org/issues/12142">https://bugs.ruby-lang.org/issues/12142</a></li>


 	

<li><a href="https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding#hash-changes">https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding#hash-changes</a></li>


 	

<li><a href="https://en.wikipedia.org/wiki/Hash_table">https://en.wikipedia.org/wiki/Hash_table</a></li>


 	

<li><a href="http://patshaughnessy.net/ruby-under-a-microscope">http://patshaughnessy.net/ruby-under-a-microscope</a></li>


</ol>

]]&gt;		</p>
]]></content:encoded>
							<wfw:commentRss>/behind-scenes-hash-table-performance-ruby-2-4/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
							</item>
		<item>
		<title>Different types of Index in PostgreSQL</title>
		<link>/different-types-index-postgresql/</link>
				<pubDate>Mon, 19 Sep 2016 07:38:45 +0000</pubDate>
		<dc:creator><![CDATA[coderhs]]></dc:creator>
				<category><![CDATA[Beginners]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[B Tree]]></category>
		<category><![CDATA[BRIN]]></category>
		<category><![CDATA[GiST]]></category>
		<category><![CDATA[Hash]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[part two]]></category>
		<category><![CDATA[series]]></category>
		<category><![CDATA[SP-GiST]]></category>
		<category><![CDATA[Speeing up database]]></category>

		<guid isPermaLink="false">https://redpanthers.co/?p=549</guid>
				<description><![CDATA[
				<![CDATA[]]>		]]></description>
								<content:encoded><![CDATA[<p>				<![CDATA[This is part two of our PostgreSQL optimization series. You can read the first article where we discuss when to index <a href="https://redpanthers.co/optimising-postgresql-database-query-using-indexes/">here</a>.
PostgreSQL uses a different set of algorithm while indexing tables, each type of algorithm is good for a certain set of data. Here we will be discussing the various algorithms available and when we should be using them. (Note these are the algorithms found in PostgreSQL 9.5)


<h2>Algorithms</h2>


<strong>B-Tree (Balance Tree),</strong> is the <strong>default</strong> algorithm used when we build indexes in Rails. It keeps a sorted copy of our column, which would be our index. So if we want to find the row of the word starting with <strong>a </strong>then as soon as the words starting with a are over. It will stop searching and return null, as the index has kept everything sorted. It is good in most cases, hence it is the default algorithm used.
<strong>Hash </strong>is one of the most popular indexing algorithms. But only the equate operator works on it, thus the query planner will only use an index with a hash algorithm if we do an equal operation searching for it. Another point to note is that Hash index is not WAL (Write Ahead Log) logged, so if the database crash we can&#8217;t rebuild the index and would need to REINDEX the entire column.
<strong>GIN</strong>, <strong>Generalized Inverted Indexing</strong> are great for indexing columns and expressions that contain an array, JSON, JSONB, etc. Internally, a <acronym class="ACRONYM">GIN</acronym> index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items and where each tuple in a leaf page contains either a pointer to a B-tree of heap pointers.
<strong>GiST</strong>, <strong>Generalized Search Tree</strong> isn&#8217;t a single indexing scheme but rather an abstraction that makes it possible to implement indexing schemes for new data types by providing a balanced tree structure access method. In the past building and implementing custom indexing algorithm for custom data types include an understanding of the internals of the database. With the implementation of GiST, it provides an abstraction of the internal working which can be used to build your own indexing algorithm. It uses B-Tree internally, and thus we can use GiST to index IP address, Geo Location, etc.
<strong>SP-GiST</strong>, <strong>Space Partitioned  Generalized Search Tree</strong> &#8211; as the name suggest its GiST implementation itself but instead of balance tree structure we can use one of the non-balanced tree structure such as radix tree, quadtree, k-d tree.
<strong>BRIN, Block Range Indexes </strong>are designed to handle very large tables in which the rows’ natural sort order correlates to certain column values. For example, a table storing log entries might have a timestamp column for when each log entry was written. By using a BRIN index on this column, scanning large parts of the table can be avoided when querying rows by their timestamp value with very little overhead.
&nbsp;]]&gt;		</p>
]]></content:encoded>
										</item>
		<item>
		<title>Optimising PostgreSQL database query using indexes</title>
		<link>/optimising-postgresql-database-query-using-indexes/</link>
				<comments>/optimising-postgresql-database-query-using-indexes/#comments</comments>
				<pubDate>Thu, 11 Aug 2016 10:59:22 +0000</pubDate>
		<dc:creator><![CDATA[coderhs]]></dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[B Tree]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[GIN]]></category>
		<category><![CDATA[Guide]]></category>
		<category><![CDATA[Hash]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[migration]]></category>
		<category><![CDATA[multi column]]></category>
		<category><![CDATA[partial migration]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[single column]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">https://redpanthers.co/?p=389</guid>
				<description><![CDATA[
				<![CDATA[]]>		]]></description>
								<content:encoded><![CDATA[<p>				<![CDATA[At Red Panthers PostgreSQL is our go to database we use it everywhere. So thinking about how to optimize our database performance is one of the most talked about topic at our office. The best way to speed up report generation and data retrieval within a rails application is to leave it to the database, as they have algorithms and optimizations build just for that. We always felt that most Ruby on Rails projects out there, do not use the full potential of a database and they usually just limit it to a data store. PostgreSQL or any database for that matter is much more than that.
We would be blogging on how we use PostgreSQL in our projects to speed up our client's applications. This particle is the first part of a series of article we would be writing on database optimization.
<img class="aligncenter" src="http://i.imgur.com/YSZE83d.jpg?1" width="231" height="172" />
<strong>Database Indexes:</strong>
Indexes are a special lookup table that the database search engine can use to speed up data retrieval. An Index is similar to a pointer to a particular row of a table. As a real world example, consider a Britannica Encyclopedia with 22 volumes of books, and an extra book listing  the index,with which you can find out the item you are searching for in those 22 books.
PostgreSQL 9.5, provides several index algorithms like B-tree, Hash, GiST, SP-GiST and GIN. When you create an index using Ruby on Rails migration, by <strong>default it would be using the B-Tree</strong> migration. Whereas, as we move on to the indexing algorithms, we need to check into the general classification of an index and the data to be indexed.
<strong>Primary Keys</strong>
In rails, when we generate a model , by default an ID column would be added to the table. It would have integers values and they would be unique as well. By default, when you set a column as a primary key, the database would build a <em><strong>unique index </strong></em>on it. So we don&#8217;t need to add migration for it.
<strong>Foreign Keys</strong>
When you build a relationship between two tables you add a <code>foreign_key</code> in the child table to point to parent, eg: <code>user_id</code>, <code>group_id</code>. We need to query through this relationship a lot in rails, for example to load all the comments of post or all members of a group.<strong> So we need to index that for speed.</strong>.
If you are using some non id value to reference a table, lets say in your application ,you give all your users a unique URL which has the username. (Eg: http://csnipp.com/coderhs), in that case we would be using the username to query the data, so we need to have it indexed. In fact you should index all the columns you might be using in your where queries. Like if you are searching for users of a particular age or income frequents in your reports, then you should create an index for them as well.
<strong>Note:</strong>  What we explained above are single column indexes and multi column indexes. So if you are indexing just a single column in a table, its single column indexes.


<pre class="lang:pgsql decode:true">CREATE INDEX index_name
ON prices (user_id);</pre>


Rails code:


<pre class="lang:ruby decode:true">add_index :prices, :user_id</pre>


When we index multiple columns, they are called multi column index.


<pre class="lang:pgsql decode:true">CREATE INDEX index_name
ON user_views (user_id, article_id);</pre>


Rails code:


<pre class="lang:ruby decode:true">add_index :user_views, [:user_id, :article_id]</pre>


If you are joining two tables, using a column (which is not the already indexed foreign key) then you should index that as well.
<strong>State column &amp; Boolean column</strong>
State and Boolean column are columns that should be indexed as there would be a lot of rows but only limited number of values in those columns. Boolean column would have only true or false (two values)and state columns will have more than two like eg: draft, published, pending. The speed of load would be faster for these columns as they are only limited keys that can be placed in the index, and on a single lookup we can load them.
<strong>Partial indexes</strong> can be used in the above case, as the name suggests it&#8217;s an index over a subset of your table. The index would be building if it satisfies certain conditions. They can be most useful <strong>while writing scopes in rails. </strong>Lets say that you have a scope that picks up all the articles which are marked as SPAM. In your model you will write a scope like below


<pre class="lang:ruby decode:true">scope :articles, where(:spam =&gt; 'true')</pre>


So internally it&#8217;s a where query, one can add a partial index migration as follows:


<pre class="lang:pgsql decode:true">CREATE INDEX index_name
on articles (spam is true);</pre>


Rails:


<pre class="lang:ruby decode:true">add_index :articles, :spam, name: "index_articles_on_spam", where: "(spam IS true)"</pre>


<strong>When not to use indexes</strong>
Using indexes speeds up the SELECT and WHERE command, but it does slows down the execution of INSERT.
<strong>So avoid indexing when we have table that has a lot of insert and update</strong>
So we should avoid using Indexes when we have a table that holds a huge number of raw data, where we do a lot of batch updates and insert. For example, in an IoT application we would pipe all the data from multiple devices to a single table , summarize  and insert it into its proper tables. And  by a lot of data, I am referring to at least 10+ MB of data per minute. In most cases, we would just truncate that table after processing, hence it would slow us down if we were to index it.
<strong>If the table is too small and you know it will always be small</strong>
If you have a setting table which just stores the application settings, that can be modified by an admin panel. Then it doesn&#8217;t seem to be worth having an index there.
<strong>When you are manipulating the values of a column a lot</strong>
Lets say the particular value of a column gets changes extremely a lot, like the website view count. Then indexing it is not highly recommended.
Finally to complete this article. If you want to drop an index:
SQL


<pre class="lang:pgsql decode:true">DROP INDEX index_name;</pre>


Rails


<pre class="lang:ruby decode:true">remove_index :books, :created_at</pre>


<strong>Summary</strong>:


<ul>
 	

<li>Index Primary key</li>


 	

<li>Index Foreign key</li>


 	

<li>Index all columns you would be passing into where clause</li>


 	

<li>Index the keys used to Join tables</li>


 	

<li>Index the date column (if you are going to call it frequent, like rankings of a particular date)</li>


 	

<li>Index the type column in an STI or polymorphism.</li>


 	

<li>Add partial index to scopes</li>


 	

<li>Do not index tables with a lot of read, write</li>


 	

<li>Do not index tables you know that will remain small, all through out its life time</li>


 	

<li>Do not index columns where you will be manipulating lot of its values.</li>


</ul>


Keep visiting here to know more about the PostgreSQL indexing algorithms in part 2 of this article.
<strong>References:</strong>
<a href="https://www.postgresql.org/docs/9.2/static/indexes-types.html">https://www.postgresql.org/docs/9.2/static/indexes-types.html</a>
<a href="http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html">http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html</a>
<a href="http://www.tutorialspoint.com/postgresql/postgresql_indexes.htm">http://www.tutorialspoint.com/postgresql/postgresql_indexes.htm</a>]]&gt;		</p>
]]></content:encoded>
							<wfw:commentRss>/optimising-postgresql-database-query-using-indexes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
							</item>
	</channel>
</rss>
