Archive for October, 2002

How to do Interprocess Communication in k (k3 that is)

Tuesday, October 15th, 2002

Start Listening

To start a k server listening on a port, run up a k session, and invoke

m i 2001

which will set it to listen on port 2001.

Stop Listening

To stop listening, invoke

m i 0

Amend

As Stevan Apter has mentioned on the k listbox, the key to k ipc is to see that the default message processing operation is the dot operator

.

So assuming we have a vector, a which contains !10, e.g.

a:!10

Using the familiar assignment operator : we can assign the value of a to another variable v in the same instance of k as follows

v:a

We can also achieve that using the dot operator as the amend verb (see Amend, K Reference Manual)

.[`v;();:;a]

or

.(`v;();:;a)

And so to set the value v on a remote server, using the local value of a, we can write

h:3:`”localhost”,2001 / Opens a connection to localhost, port 2001

h 3:(`v;();:;a) / Assign remote v the value of local a

3:h / Close the connection

Note that we used the 3: operator to send the data - this is an asychronous send mode, and will not block the local instance of k. We do not expect a result from this execution, so there is not need to wait for the completion of the invocation of the function.

Note that this can also be achieved with

h 3:”v:”,5:a

i.e. send the request over as text, but this can be horribly inefficient.

Suppose we only want to set an element of v to be a, e.g.

v[i]:a

This can be achieved via

.[`v;i;:;a]

or

.(`v;i;:;a)

And to do this on a remote server, using the local values i and a, assuming a valid connection handle h

h 3:(`v;i;:;a)

Suppose we would like to do

v +:a

which can be achieved via

.[`v;();+;a]

or

.(`v;();+;a)

And to do this on a remote server, using the local value of a, assuming a valid connection handle h

h 3:(`v;();+;a)

+ in this case can be replaced by any primitive, e.g. -,*% etc

And to conclude the dot section, consider

v[i]+:a

which can be achieved via

.[`v;i;+;a]

or

.(`v;i;+;a)

And to do this on a remote server, using the local value of i and a, assuming a valid connection handle h

h 3:(`v;i;+;a)

These style of calls using the amend verb will only work with k servers. Unfortunately, k and kdb have different message handlers.(.m.g) , because k and ksql are different languages. This means that we cannot get at the amend verb when talking to a kdb server. We can however issue ksql, e.g.

r:h 4:(”:”;(`v;a))

which will set the value v remotely to the local value of a. More complex ksql IPC can be thought up, e.g.

tablename:`trade
r:h 4:(`.d.r;(”. select count $ from ?”;,`tablename))

What about function calls? Consider the local invocation of a function f, with the parameters a, b and c, returning a value and assigning that to the variable r, i.e.

r:`f[a;b;c]

This can be written as

r:.(`f;(a;b;c))

And to execute the remote function f, using local parameters a,b and c, assuming a valid connection handle h

r:h 4:(`f;(a;b;c))

Function f must exist on the remote instance.

Notice that in this case, as we are expecting a result to be returned from the function, we use the synchronous send mode, 4:. This will cause the local instance of k to block until the remote instance has conpleted the invocation of the function and returned the result.

What type of server are you?
If you are unsure what kind of server you are connected to, you can issue a command that will work on both systems, and return different answers depending on whether it is k or kdb. e.g.

h 4:”1%1″

returns a float if the remote server is k, and returns an int if the remote servers is kdb. This is because % means divide in k, and means mod in ksql. It is worth noting that the k ticker plants have the features of kdb but actually have a k message processor (instead of ksql) on the IPC.

Error Trapping
Error trapping in k involves wrapping up the k to be executed using the Apply verb. The result is a 2 item list, the first item being 0 (success) or 1 (failure). The second item either contains the expected result, as per normal execution outside of error trap, or the failure message as a char vector). For further info on k error trapping please see the K Reference Manual, Apply verb.

IPC requests can be error trapped as follows

Attempt to open a connection

r:@[3::;(`localhost;2001);:]

Attempt to get a table list

r:.[4:;(handle;”tables”);:]

Attempt to get the rowcount of tablename

r:.[4:;(h;(`.d.r;,(”select count $ from ?”;,tablename)));:]

Authentication and Access Control
In k there is an Authorization Vector, .m.u that contains the names of users that are permitted to connect to the process in which .m.u is defined.

e.g. on the server, define

.m.u:,`charlie

which means that only processes identifying themselves as charlie will be allowed to connect, and then on the client (assuming that you are not actually running as user charlie!), try to connect and get a table listing

h:3:`,2001
h 4:”tables”
index error
h 4:”tables”
^

what has actually happened here, is that when the client connected it implicitly sent the username that the client process was running under to the server. The server checked whether this name was in .m.u, and as it was not, it disconnected the client immediately (you can see this through detecting the closing of connections, below). However, the disconnect was not detected until we tried to use the handle again by trying to get the list of tables

In kdb there is an additional level of access control. It is configured through 2 tables

user:([user]password)
access:([access,var,user])

e.g. start with an empty table

user:([user:()]password:())

then

‘user’insert(’admin’,'pw310′)

If there is no user table everyone is allowed. If there is no access table everything is allowed.

Clients can authenticate through

http://user:password@host:port/?query
KDBC: h:3:`host,port; h 4:”user:password”
ODBC: DBQ=//host:port;UID=…;PWD=…
JDBC: Properties p=new Properties;p.put(”user”,…);p.put(”password”,…);
Connection c=DriverManager.getConnection(”jdbc:kx://host:port”,p);

If a server is expecting a client to authenticate, then the client should send the following as the first kdb command

username:password

as a char vector. e.g.

h:3:`”localhost”,2001

h 4:”admin:pw310″

if the user is not permissioned, a `user error will be thrown.

If this does not give sufficient access control, one can always insert access control in the message filters on the server. See Tracing IPC and Message Filters, below.

Dropped/Closed Connection Detection
When a connection is dropped or gracefully closed by the remote party, the character string .m.c is automatically executed. The handle associated with the dropped connection is available as _w. e.g. define your dropped connection handler as

.m.c:”`0:”Closed “,($_w),n”

and then setup a connection from a client, and close the connection from the client.

What is the context?
When a remote user is executing code on your server, the username used to establish the IPC connection is available as a symbol in _u. e.g.

_u
`cskelton

As is the ip address of the remote side of the connection - this is in _a and can be formatted as

`$1_,/”.”,’$256 _vs _a
`”192.168.1.5″

And the handle of the connection is also available, in _w, e.g on the server one might see this

m i 2001
fn:{`0:$_w}
1840

when the client runs this

h:3:`,2001
h 4:(`fn;)

Tracing IPC and Message Filters
One can trace incoming IPC messages in kdb by setting

.d.DF:1

IPC can also be traced by intercepting the messages by overriding the message filters, .m.g and .m.s.

.m.g is invoked when a 4: request is received. .m.s is invoked when a 3: request is received. If you override .m.g as

oldmg:.m.g

.m.g:{[x] `0:”Received:”,(5:x),”n”;oldmg[x]}

then you should see all incoming 4: requests being printed in the console. One advantage of this is that you will see all incoming messages regardless of whether they are about to cause an error or not. This can be very handy when debugging rogue clients.

IPC Message Structure
The IPC message structure is simply the k data types in a serialised form, with a short header describing the message type and endian system used for encoding.

The message header format is as follows

byte offset 0 1 2 3 4 5 6 7
contents endianness 0 0 message type message length

Endian can be either 0 (big endian) or 1 (little endian).
Message type can be either 0 (async), 1 (sync) or 2 (response). In k, these message types are

handle 3:”a:!10″ / set a to !10 on the remote server, using async mode, 3:

In that case, the outgoing msg has msg type 0 (async)

handle 4:”!10″ / evaluate !10 on the remote server, send the results back, using sync mode, 4:

In that case, the outgoing message has msg type 1 (sync), and the resulting incoming message has msg type 2 (response)

In java the IPC message can be read a follows

public synchronized Object k() throws IOException, KServerException
{
byte [] buffer= new byte[8];

DataInputStream dis = new DataInputStream(socket.getInputStream());

dis.readFully(buffer); // read 8 bytes from the stream

architecture= b[0] == 1; // little endian if b[0] is 1. Big endian if b[0] is 0
msgOffset= 4; // skip the next 3 bytes - don’t worry about message type
int msgLength= readInteger(); // Now read integer contained in b[4..7]
buffer = new byte[msgLength]; // Allocate array to hold complete message payload
dis.readFully(buffer); // Read msgLength bytes from input stream
msgOffset= 0;

return readMessage(); // And deserialise the message payload
}
So the complete message format is header+serialised data.

The serialised data is simply encoded according to whether it is atomic or a vector. Atoms are encoded as

type followed by data, e.g. for an integer of value 12005, it is encoded as int[]{1,12005}. It is possible to see what the type number means by using the 4: operator in k. e.g. 4:100 results in type 1, an integer. 4:100.0 results in type 2, a float.

An int is encoded as 4 bytes, so the encoding of this integer actually takes up 8 bytes - 4 for the type, and 4 for the actual int value.

The data is 8 byte aligned, which is fine for the encoding of an int - it takes up 8 bytes so it is aligned by default.

However, an encoding of a k float of value 12005.0 results in int[]{2,0} followed by 8 bytes respresenting the double 12005. The encoding used for k floats is the IEEE Standard 754 Floating-Point. This is 8 byte aligned so no further padding is required.

The encoding of a k char is an integer respresenting the type, in this case 3 (try 4:”a” at the k prompt), followed by a single byte representing the char. The charset encoding used is plain ASCII. This is 8 byte aligned so no further padding required.

The encoding of a k symbol is an integer respresenting the type, in this case 4(try 4:`”hello” at the k prompt), followed by a number of bytes, each representing one char from the symbol. It is null-terminated, i.e. the end of the symbol is signalled by a value of 0. The charset encoding used is plain ASCII. As this data is not necessarily 8 byte aligned (the symbol can be of any length), we must pad the remaining space to the next 8 byte border to regain alignment.

Encoding of vectors is much the same, except the length of the vector is inserted between the type and the data. e.g.

a vector of the symbols `”MSFT” `”IBM” would be encoded as

-4,2,77,83,70,84,0,73,66,77,0,0

note the extra 0 on the end to realign to an 8 byte boundary.

There is a bit of handshaking that goes on when a connection is established. It is simply that the side which starts the connection sends an 8 byte message which is the username to be associated with the connection. In the server, the username for the context can be seen via _u, and restricted by the .m.u vector.

A short introduction to k, kdb and kdb+

Sunday, October 13th, 2002

The k language

The k language was originally developed by Arthur Whitney, the CTO and Co-Founder of Kx Systems Inc, and a very influential member of the APL community. Prior to founding Kx, Mr. Whitney was a Managing Director of Union Bank of Switzerland (UBS) in New York - from 1994 to 1997 UBS purchased exclusive rights for the use of K. Whilst at UBS, he led an internal team that developed global trading and risk management systems using the k language. Earlier, Mr. Whitney was at Morgan Stanley, where he developed the A+ programming language, used to build trading systems, databases and analytics for equities and fixed income. Mr. Whitney studied set theory, foundations and computational complexity at the University of Toronto and Stanford.

Arthur Whitney

k is an interpreted vector/list based language that runs on windows, linux and solaris. Evaluation versions can be downloaded from the Kx website.

k’s features include

Types - atoms or vectors of integer,float,char,symbol. Function, Dictionary, Generic list.

Functions
Assignment
A rich set of operators
Conditional statements
Scoping
Input/Output - console/disk/network
Memory mapped files
Garbage Collection
Network communications

k forms the basis of the ultrafast database kdb.

kdb
kdb is 100% pure k. It is an ultrafast database that is used by many investment banks and insurance companies world-wide. It can be seamlessly extended using the k language, or the c language - other languages should be possible if they can bind to the k libs. There is a jdbc driver, and client interfaces using .NET, or c are available.

kdb supports ksql, and sql. ksql is a highly optimised sql that executes extraodinarily quickly. kdb is very scalable, and handles hundreds of gigabytes of market data in production systems for many companies.

An example of how easy it is to extend kdb - here is a sum function

In k, define a function that takes one parameter, which is a vector, and return the sum over all the elements

mysum:{[v] +/v}

This can then be used in ksql as follows

select mysum[am_size] from trade

There is already a sum function in ksql, and the above is merely to demonstrate how easy it is to extend kdb.

Stored procedures in kdb are simply k functions, e.g. a parameterised stored procedure that gets the volume traded for Vodafone on the London Stock Exchange between 11:00 and 11:04 could be written as this in k -

getVolume:{[sym;start;end]
:.d.r[(”. select sum[am_size] from trade where sym=?,time>?,time< ?”;(sym;start;end))]
}

and invoked from ksql as

getVolume[’VOD.L’,time[’11:00′],time[11:04′]]

returning a float. It is really that simple.

So why use it? Ok, it’s easy to extend. The killer feature is that it is unbelievably fast. Fast on getting data in, and getting data out. I’ve seen systems inserting sustained rates of several thousand rows of data per second, and performing queries and getting data out in tens of milliseconds. The system is extremely robust and scalable. Plus it is supported worlwide by a large team of qualified consultants.

kdb+
kdb+ is the next product in the evolution of k/kdb - a 64bit system with a new language. Instead of there being 2 separate languages, k and ksql have been unified into the language q. There are many more types available, and the best core ideas that made k and kdb so successful are still in there. If APL is an acronym for “A Programming Language”, then q should probably be renamed TPL, an acronym for “THE programming language”. kdb+ is not new - it had been in the works for many years before its release in 2003, and already has an impressive installation base amongst investment banks.

All of these products are supported world-wide by Kx System’s strategic partner for Sales and Support - First Derivatives Plc.