Docs - Linqdb

What is Linqdb

Technically it's just a wrapper around Rocksdb. It has many simillarities to relational databases, namely tables, sql-style "select from where" thinking. So why use it and when?
Linqdb is suitable when there are independent tables of data. In such cases it's much easier to use than a relational database.

Creating new entities

An entity or a table or a collection is defined by a class. The class must have int Id property which if 0 indicates new entity and if not - the entity to be updated. Supported data types are: int, double, DateTime, long, decimal (and their nullables), byte[] and string. (Note bool is not supported, use int instead). For example, we’ll use:

public class SomeData
{
    public int Id { get; set; }
    public int PeriodId { get; set; }
    public int? ObjectId { get; set; }
    public double Value { get; set; }
    public double? Normalized { get; set; }
    public int? PersonId { get; set; }
    public DateTime? Date { get; set; }
    public string Name { get; set; }
    public string NameSearch { get; set; }
    public string StringVal { get; set; }
}

Then to save new record we would (table will be created automatically if not there based on type's name):


using LinqdbClient;
public class DbFactory
{
    static object _lock = new object();
    static Db _db { get; set; }

    public static Db GetDb()
    {
        if (_db == null)
        {
            lock (_lock)
            {
                if (_db == null)
                {
                    _db = new Db("ip:port", "user", "pass");
                }
            }
        }

        return _db;
    }
}


...
var db = DbFactory.GetDb()

var d = new SomeData()
{
    Id = 0,
    Normalized = 1.2,
    PeriodId = 5
};
db.Table<SomeData>().Save(d);

//db.Dispose(); //disposal is not needed after an operation, only (optionally) at the end of application lifetime.

To save bulk of data more efficiently:

var list = new List<SomeData>();
...
db.Table<SomeData>().SaveBatch(list);

Querying

Select

Select everything from table:


var db = DbFactory.GetDb();

List<SomeData> res = db.Table<SomeData>().SelectEntity();

To select only some columns:

var res = db.Table<SomeData>()
            .Select(f => new 
            { 
                Id = f.Id,
                Normalized = f.Normalized
            });

Note that only anonymous types are supported in this case. res is List<AnonymousType>

There is a limit in the amount of data that can be selected at once. If you need to select large amount there are non-atomic selects which would do the work in iterations. The drawback of this method is the data may change between iterations and the result might be mismatching initial select conditions.

Where

Linqdb supports Where clause:

var res = db.Table<SomeData>()
            .Where(f => f.Normalized == 2.3 || f.Date > DateTime.Now && f.PersonId == 5)
            .Select(f => new 
            { 
                Id = f.Id,
                Normalized = f.Normalized
            });

Where supports these operators: && || == >= > < <= != . Also on the left hand side of operator there must be some property without any expression (i.e. f.PersonId % 2 == 0 won‘t work), on the right hand side - constant/variable/expression. Types on both sides must match exactly (i.e. int != int?) and casting is only possible on the right side.
Empty string and empty array is treated as null.

Between

To evaluate something like .Where(f => f.Normalized > 3 && f.Normalized < 10) Linqdb will iterate data twice: first it will find all values > 3, then all values < 10 and finally return it’s intersection. So in this case it’s faster to use .Between(f => f.Normalized, 3, 10, BetweenBoundaries.BothExclusive) which will only scan data once: from value 3 to 10.

var res = db.Table<SomeData>()
            .Between(f => f.Normalized, 3, 10, BetweenBoundaries.BothExclusive)
            .Select(f => new 
            { 
                Id = f.Id,
                Normalized = f.Normalized
            });

Intersect

Finds intersection of values in table with given set:

var res = db.Table<SomeData>()
            .Intersect(f => f.Normalized, new HashSet<double>() { 10, 20, 30 })
            .Select(f => new
            {
                Id = f.Id,
                Normalized = f.Normalized
            });

Order, skip, take

Ordering (by one column) is supported:

var res = db.Table<SomeData>()
            .OrderBy(f => f.Normalized)
            .Skip(10)
            .Take(10)
            .Select(f => new
            {
                Id = f.Id,
                Normalized = f.Normalized
            });

Ordering by Id is the fastest.

There are also overloads of Select and SelectEntity that take object of type LinqdbSelectStatistics. This object will be populated with the number of records that satisfied condition. (Handy with .Skip and .Take when the total number is also needed). SearchedPercentile shows how much SearchTimeLimited has managed to search in given time.

Search

Linqdb supports simple full text search on string columns that end with ..Search (or .SearchS). For example,

var res = db.Table<SomeData>()
            .Search(f => f.NameSearch, "some text")
            .Select(f => new
            {
                Id = f.Id,
                Name = f.NameSearch
            });

will find rows where column NameSearch contains both "some" and "text".
.Search takes optional prameters start_step and steps which enable search on some part of documents. This is handy when you have lots of data and want to return some results as fast as possible. The search will only happen on slice of data, for example, start_step = 0, steps = 1 means only first 1000 documents will be searched, start_step = 10, steps = 5 - 5000 documents will be searched starting from document 10000. You can get total number of steps using LastStep function.

You can also limit search time by using SearchTimeLimited - it will search as much as it can in given time (how much was searched is in LinqdbSelectStatistics object's SearchedPercentile property).

You can make search by the start of word by using SearchPartial. In this case, however, you cannot search some data - the search will be performed on all rows.

If property name ends with ...SearchS the search index will only be using spaces to get words (as opposed to spaces and special characters). For example, string "hi@test me" would normaly satisfy search queries made of words: hi, test, me, hi@test. If, however, the property being searched ends with ...SearchS it will only satisfy queries containing these words: hi@test, me.

All operations on strings are case-insensitive. Under the hood .Search is a simple on-disk inverted index, partitioned per batches of documents.

Bitfunnel search

Bitfunnel is efficient search algorithm that uses bloom filters. To create in-memory bitfunnel index on a string column:

var ratio = db.Table<SomeData>()
            .CreatePropertyBitFunnelMemoryIndex(f => f.StringVal, 160, 2048);

Second parameter is a bloom filters size in bits. The longer the text of StringVal property in general, the greater length is needed. Returned ratio indicates the set bits to all bits ratio in a bloom filter given that index is being created on some existing data. This parameter decides how many false-positives the search will return. The next parameter is the documents batch size the bloom filter is created on. Default is 1024 documents. The smaller the batch the easier it is to perform modifications operations, but the larger the batch - the faster and more memory efficient is the search. After index is created it will be maintained automatically and rebuilt on server reboot just like other in-memory indexes until it is removed by RemoveBitFunnelMemoryIndex. In order to make actual search:

var res = db.Table<SomeData>()
            .SearchBitFunnel(f => f.StringVal, "some query")
            .GetIds();

Bitfunnel will return true results as well as false positives. Under the hood 3 hash functions are used. Note for maximum efficiency GetIds is used in the example.

Vector search

Vector search is supported that doesn't require a lot of RAM. Property must have float[] type and end with L2 - then an on-disk index will be built on it automatically. Such property can be searched using SearchNeighbours method:


    var res = db.Table<SomeData>()
    .SearchNeighbours(f => f.MyVectorL2, target_vector, k).SelectEntity();

LinqdbSelectStatistics can be supplied to .Select or .SelectEntity to obtain dictionary of (Id, Distance) of returned vectors. Works on partitioned tables as well.

By default when statements like .Where or .Search go together they imply logical "and" to the results. It is possible to have "or" on neighbouring statements like so:

var res = db.Table<SomeData>()
            .Search(f => f.Name, "some text").Or().Search(f => f.Name, "something else")
            .Where(f => f.Id > 100)
            .Select(f => new
            {
                Id = f.Id,
                Name = f.Name
            });

In this case only one of search need to satisfy to return the result. More than one .Or could be used.

Intermediate results

All the work happens in .Select or .SelectEntity statement (generally - in the last statement), so things like these are possible:

var tmp = db.Table<SomeData>()
             .Where(f => f.Normalized == 5);
if (person_id != null)
{
    tmp.Where(f => f.PersonId == person_id); //no need to assign to tmp
}
var res = tmp.SelectEntity();

Count

To obtain table count:

db.Table<SomeData>().Count();

Also works when conditions applied:

db.Table<SomeData>().Where(f => f.Id < 50).Count();

GetIds

GetIds() method allows to get list of id's satisfying conditions without having to select them. Id (or <Sid, Id> in case of .DistributedTable) is a special column - it's value is encoded in every column and carried around in all operations. That's why GetIds have it without having to do select as with other columns.

Updating

Save(item); will update row with item’s Id using item’s properties, given that Id is not 0. If it is 0, new item will be created and new Id will be assigned to the object’s Id property.

To update column of multiple rows you would need to construct Dictionary<int, T> where T is column’s type and:

var dic = new Dictionary<int, int?>();
dic[2] = 8;
dic[3] = 11;
db.Table<SomeData>().Update(f => f.PeriodId, dic);

This will update column PeriodId of rows with Ids 2 and 3 with respective values 8 and 11.

Deleting

To delete row(s) you would need to construct HashSet<int> of ids to be deleted and:

db.Table<SomeData>().Delete(new HashSet<int>() { 2, 3 });

Atomic increment

To increment values atomically. i.e. thread-safe, use .AtomicIncrement:

var new_item = new Counter() { Name = "unique_name", Value = 1 };
db.Table<Counter>().Where(f => f.Name == "unique_name").AtomicIncrement(f => f.Value, 1, new_item, null);

Transactions

Linqdb supports transactions:

using (var transaction = new LinqdbTransaction())
{
    var d = new SomeData()
    {
        Id = 1,
        Normalized = 1.2,
        PeriodId = 5
    };
    db.Table<SomeData>(transaction).Save(d); //note that .Table takes transaction as a parameter
    var d2 = new BinaryData()
    {
        Id = 1,
        Data = new List<byte>() { 1, 2, 3 }.ToArray()
    };
    db.Table<BinaryData>(transaction).Save(d2);
    transaction.Commit(); //all writes happen here, if it fails - nothing gets modified (all or nothing)
}

if .Commit is not called, nothing is modified. Transaction must be created used and destroyed in the same thread. After .Save (or .SaveBatch) inside a transaction new items (with Id == 0) are assigned new ids, so that they can be used before doing commit. Transaction is supported on these data-modifying commands: .Save, .SaveBatch, .Update, .Delete.

Indexes

Linqdb supports in-memory indexes on columns of type int, DateTime, double, long and decimal. In-memory indexes speed up reading operations on such columns. Also indexes are required for .GroupBy statement (see below). To build an index on property use .CreatePropertyMemoryIndex, to remove - .RemovePropertyMemoryIndex Indexes slow down data-modifying operations, so use them only when you have to. Before using indexes consider using other features for improving performance: DistributedTable or PartitionedTable, or DistributedPartitionedTable. If indexes (or searchable string properties) are present it is especially efficient to use batch operations, i.e. .SaveBatch and to make batches larger.

GroupBy

Suppose you want to group by column PeriodId and aggregate column Value. For that in-memory index must be created:

db.Table<SomeData>().CreateGroupByMemoryIndex(f => f.PeriodId, f => f.Value);

Then you could do something like:

var res = db.Table<SomeData>()
            .GroupBy(f => f.PeriodId)
            .Select(f => new { f.Key, Sum = f.Sum(z => z.Value), Avg = f.Average(z => z.Value) });

These aggregation functions are supported: Count, CountDistinct, Sum, Max, Min, Average. Key selects group-by property's value.

Partitioned table

Partitioned table is for the following situation: when you have a large table but perform operations only on some slices of it. For example, you have stock prices but only work with per company data. In such case partitioned tables offer great performance boost. Example of usage:

var result = db.PartitionedTable<PricePoint>("msft").SelectEntity();

PartitionedTable takes partition argument which in this case is symbol of a company. Under the hood partitioned table is actually separate table and as such records in Rocksdb start with same unique prefix. Since those records are sorted they will be in same file or much smaller number of files compared to the case if table wouldn't be partitioned. This fact makes operations on them fast. The downside of partitioned table is you have to make multiple calls to the db if you're working on several partitions. Perhaps for those operations data could be duplicated on a non-partitioned table.
Transactions are not supported on partitioned tables.

Distributed db (aka sharding)

The idea is to have a table that is simillar to regular .Table but under the hood the data is distributed among N servers. This way you can scale your database using similar code. To create distributed database you just create N servers, construct usual Db objects and pass them to DistributedDb:


        var list = new List<Db>()
        {
        new Db("ip:port", "user", "pass"),
        new Db("ip:port", "user", "pass"),
        new Db("ip:port", "user", "pass"),
        new Db("ip:port", "user", "pass"),
        new Db("ip:port", "user", "pass"),
        new Db("ip:port", "user", "pass"),
        };
        var distributeddb = new DistributedDb(list.Select((f, i) => new { Db = f, index = i }).ToDictionary(f => f.index + 1, f => f.Db));

Now distributeddb has .DistributedTable which has operations simillar to .Table. Here are the differences:

Entity class now must have Id and Sid (server id) properties. This pair identifies a record and is called DistributedId. They are only going to be used in data modification operations: Delete and Update. To uniquely identify a record you would give it a Guid property without relying on Id and Sid.
No transactions are supported on distributed table. It is possible that data might be saved on one server and throw exception (not saved) on the other.
GroupBy statement returns dynamic result. Less aggregation functions are supported: Sum, Count, Min, Max, Average

Some other notes:

data is distributed randomly and uniformly among servers.
if one server is down - the whole operation breaks.
DistributedPartitionedTable is also supported
IntersectWithDistributedIds is available for distributed tables

Generic types

One of a coolest features of Linqdb: it supports working on generic types: db.Table<T>().. So yo can have a generic save and get methods:


public static void SaveGeneric<T>(T item) where T: new()
{
    var db = DbFactory.GetDb();
    var res = db.Table<T>().Save(item);
}

public static List<T> GetGeneric() where T: new()
{
    var db = DbFactory.GetDb();
    var res = db.Table().SelectEntity();
    return res;
}

This way you can save and retrieve different entities just by using these two methods. You could also define interfaces with some common properties and work with these properties in these generic methods too.

Non-atomic modifications and select

There is a certain limit on batch sizes in modification operations, so that they could be performed atomically (all or nothing) using ACID principles. If there is a need to modify large amount of data and ACID principles are not important, there are convenience methods: SaveNonAtomically, UpdateNonAtomically, DeleteNonAtomically. These cannot be used in a transaction.
Same goes to Select vs SelectNonAtomically. Select does reading using ACID principles with a specific data-snapshot. However, there is a limit of how much data can it read in one go. If ACID is not important (like, for example, when no one is modifying data) SelectNonAtomically can read whatever amount of data in iterations using Select of smaller batches. So non-atomic versions are just convinience methods of iteratively calling their ACID-compliant counterparts.

Server status

GetServerStatus(int timeToWaitInMs = 1000) returns server's name (servername property in config.txt) and also returns no later than timeToWaitInMs ms. If server doesn't respond sooner it is considered down.

Queues

Simple, but fast in-memory queues are supported: db.Queue<T>().PutToQueue adds item to queue T and db.Queue<T>().GetAllFromQueue gets all queue's T items and removes them from queue. Only one reader is quaranteed to get specific item in multi-threaded environment. Entities used in a queue must be marked with protobuf attributes.

Named queues

Also named queues are supported with following operations: db.NamedQueue<T>("some name").PutReplaceInNamedQueue(List<T> items) - replaces all items in named queue with given items; db.NamedQueue<T>("some name").GetAllFromNamedQueue - returns all items from the queue and optionally removes them from the queue. Again atomicity and thread-safety of operations is provided.

Init \ dispose

Before using Linqdb one would need to create it:


    var db = new Db("DATA"); //embedded version
    var db = new Db("host_ip:port"); //server's client version

This db object should be a singleton as shown in DbFactory example above. The argument is path to the database or ip to the server.
Before exiting application it is a good practice to dispose the db (although currently this does nothing):

db.Dispose();

During the lifetime of application db object shouldn't be disposed.

Replication

To make a copy of a database programatically:

db.Replicate("PATH_TO_COPY");

If directory exists - it will be removed before copying. Database can still be read/written while replication is in progress.

Not tested on large database under heavy load.

Limitations \ caveats

Embedded Linqdb is one process database, that is only one process can access it at a time, even if it’s just for reading.
You cannot change type of a column. If you need that – create a new column with required type and copy data there. (Changing type from nullable to non-nullable and vice versa is ok.)
And after you modify type after it has been created - say added new column - you may want to manually fill it with whatever default value (i.e. if it is a nullable type – it won’t automatically satisfy == null after creation). In other words, type changes don’t automatically change data.
You can't select data from two or more tables in one data snapshot. More so, if you use SelectNonAtomically there is a risk that someone will modify data while select is executing and it will return a row that doesn't match the Where statement. If you need ACID operations, then you can achieve those within one table if you don't use methods that have NonAtomically in their names.
Try to keep Ids of table records sequential and starting from 1. This way search and other operations are most efficient.
There is currently no fail-safe mechanism. If server is down, database is down. If one of the servers of distributed db is down, the whole distributed db is down.

Server installation

On Windows server folder contains Server.exe which starts listening on port specified in config.txt
Server has configuration file config.txt which is self-explanatory.
You could install server as Windows service using NSSM.

On linux docker image is available