Garbage Collection in kdb+

Blog kdb+ 6 Nov 2014

Data Intellect

The aim of this article is give an understanding of how kdb+ uses and releases memory, and the options available to modify the behaviour.

kdb+ allocates memory in powers of 2.  A vector of data will always be placed into a memory block which is the next power of two up from the raw data size (and allowing for some header information).  For example a vector of 8000 8-byte long integers has a raw size of 64000 bytes.  However, it will require a memory block of size 2^16 = 65536 bytes.  We can demonstrate this with the \ts operator, which shows the time in milliseconds (first result) and space (second result) required for an operation.

q)\ts til 8000
0 65712

If we were to instead create a vector of 9000 8 byte long integers then kdb+ would use the next power of two up to store the data.

q)\ts til 9000
0 131248

The actual boundary case doesn’t quite lie at 8192 (2^13) but instead at 8190:

q)\ts til 8190
0 65712
q)\ts til 8191
0 131248

It is common with kdb+ systems for vectors to grow (e.g. rows being inserted into a table). If a vector of 8188 longs is grown by 1 element each time, the concatenations are cheap until the boundary point is hit, in which case a new memory block must be allocated.

q)a:til 8188
q)\ts a,:1
0 400
q)\ts a,:1
0 400
q)\ts a,:1
0 131232
q)\ts a,:1
0 368

In the example above when the boundary of 8190 elements is exceeded, a memory block of size 2^17 is allocated and the value of a is copied into it. The old block of size 2^16 is placed on the heap to be recycled internally.

The power-of-2 allocation approach leads to excellent performance, but potential for being “memory hungry” – the database will, in the worst case, require twice as much memory as raw data. However, the disadvantages can be managed using the garbage collection flag -g and the inbuilt kdb+ function, .Q.gc[].

Under normal circumstances (-g 0 is default), unused memory blocks are not released back to the operating system but are retained and recycled internally. Switching garbage collection to immediate mode (-g 1) means that any large memory blocks (>32 MB) freed by the process are returned immediately to the operating system. Invoking .Q.gc[] (irrespective of the –g setting) will return any large memory blocks back to the operating system and also attempt to coalesce smaller memory blocks into large blocks to be returned. A small scale example of these operations is outlined below. Note that the inbuilt .Q.w[] function can be used to retrieve the memory stats of a q process into a readable form.

q).Q.w[]
used| 118384
heap| 67108864
peak| 67108864
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754
  • used is the subset of heap which is currently being used.
  • heap is the memory allocated by the OS to the q process.
  • peak is the largest value that the q process has been allocated.
  • wmax is the memory limit enforced by the -w command line flag
  • mmap is the mapped memory in use
  • mphy is the size of the physical memory on the host
  • syms is the number of symbols which have been created (internalized) in the process
  • symw is the amount of memory used by the created symbols

We can monitor how these values change as memory is used:

q)a:til 10000000
q).Q.w[]
used| 134336160
heap| 201326592
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754

Running the operation ‘a:til 10000000’ has required the heap to be increased to approx 200 MB.

q)delete a from `.
`.
q).Q.w[]
used| 118400
heap| 201326592
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754

Here, deleting ‘a’ has reduced the memory being currently used by, but not the physical memory allocated (heap) to, the q process. This may not be ideal as the process is holding up resources it does not immediately need. Fortunately, kdb+ has a method of managing this in the form of the garbage collection function.

Let’s look at the same values if garbage collection has been set to immediate.

C:\q>q -g 1
KDB+ 3.2 2014.11.01 Copyright (C) 1993-2014 Kx Systems
q)a:til 10000000
q).Q.w[]
used| 134336160
heap| 201326592
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754
q)delete a from `.
`.
q).Q.w[]
used| 118400
heap| 67108864
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754

In the example above, we can see that the heap has been reduced as soon as ‘a’ has been deleted. It’s as simple as that! Well, almost. There is a caveat here. Setting -g to 1 doesn’t automatically clear everything. For example, if we start out as before:

C:\q>q -g 1
KDB+ 3.2 2014.11.01 Copyright (C) 1993-2014 Kx Systems

And then perform the following operations before checking memory usage again:

q)a:upper -10000?`4
q){@[`.;x;:;til 3000]} each a;
q).Q.w[]
used| 327995136
heap| 335544320
peak| 335544320
wmax| 0
mmap| 0
mphy| 2036150272
syms| 20564
symw| 858712

We can see that the used and heap memory have increased (along with the number of symbols and their size within the process).  Now, if we delete what we have just created, we get the following results:

q){value"delete ",(string x)," from `."}each a;
q).Q.w[]
used| 184128
heap| 335544320
peak| 335544320
wmax| 0
mmap| 0
mphy| 2036150272
syms| 20564
symw| 858712

You can see that the used memory has decreased, but the heap remains high. The problem here is that the “-g 1” method doesn’t clear objects with size <= 32 MB, and the assignment has created lots of objects below this size. In cases like these, garbage collection has to be run manually again:

q).Q.gc[]
268435456
q).Q.w[]
used| 184128
heap| 67108864
peak| 335544320
wmax| 0
mmap| 0
mphy| 2036150272
syms| 20565
symw| 858742

Which has reduced the heap back down to initial levels.

Does your project have a garbage collection problem that needs solved? Let AquaQ consultants take out the trash…

Share this:

LET'S CHAT ABOUT YOUR PROJECT.

GET IN TOUCH