Technically it's just a wrapper around Rocksdb. It has many simillarities to relational databases, namely tables, sql-style "select from where" thinking.
So why use it and when?
Linqdb can help rapid development process with simple code and efficient operations. It is easy to scale Linqdb horizontally.
Some scenarios are very good fit for Linqdb, for example, if you have one writer that dumps a lot of data and many readers that use that data at different angles.
An entity or a table or a collection is defined by a class. The class must have int Id property which if 0 indicates new entity and if not - the entity to be updated. Supported data types are: int
, double
, DateTime
(and their nullables), byte[]
and string
. (Note bool
is not supported, use int
instead). For example, we’ll use:
public class SomeData
{
public int Id { get; set; }
public int PeriodId { get; set; }
public int? ObjectId { get; set; }
public double Value { get; set; }
public double? Normalized { get; set; }
public int? PersonId { get; set; }
public DateTime? Date { get; set; }
public string Name { get; set; }
public string NameSearch { get; set; }
}
Then to save new record we would (table will be created automatically if not there based on type's name):
using LinqDb;
...
var db = new Db("DATA");
var d = new SomeData()
{
Id = 0,
Normalized = 1.2,
PeriodId = 5
};
db.Table<SomeData>().Save(d);
db.Dispose();
To save bulk of data more efficiently:
var list = new List<SomeData>();
...
db.Table<SomeData>().SaveBatch(list);
Select
Select everything from table:
List<SomeData> res = db.Table<SomeData>()
.SelectEntity();
To select only some columns:
var res = db.Table<SomeData>()
.Select(f => new
{
Id = f.Id,
Normalized = f.Normalized
});
Note that only anonymous types are supported in this case. res is List<AnonymousType>
There is a limit in the amount of data that can be selected at once. If you need to select large amount there are non-atomic selects which would do the work in iterations. The drawback of this method is the data may change between iterations and the result might be mismatching initial select conditions.
Where
Linqdb supports Where
clause:
var res = db.Table<SomeData>()
.Where(f => f.Normalized == 2.3 || f.Date > DateTime.Now && f.PersonId == 5)
.Select(f => new
{
Id = f.Id,
Normalized = f.Normalized
});
Where supports these operators: && || == >= > < <= !=
. Also on the left hand side of operator there must be some property without any expression (i.e. f.PersonId % 2 == 0 won‘t work), on the right hand side - constant/variable/expression.
Types on both sides must match exactly (i.e. int != int?) and casting is only possible on the right side.
Empty string and empty array is treated as null.
Between
To evaluate something like .Where(f => f.Normalized > 3 && f.Normalized < 10) Linqdb will iterate data twice: first it will find all values > 3, then all values < 10 and finally return it’s intersection. So in this case it’s faster to use .Between(f => f.Normalized, 3, 10, BetweenBoundaries.BothExclusive) which will only scan data once: from value 3 to 10.
var res = db.Table<SomeData>()
.Between(f => f.Normalized, 3, 10, BetweenBoundaries.BothExclusive)
.Select(f => new
{
Id = f.Id,
Normalized = f.Normalized
});
Intersect
Finds intersection of values in table with given set:
var res = db.Table<SomeData>()
.Intersect(f => f.Normalized, new HashSet<double>() { 10, 20, 30 })
.Select(f => new
{
Id = f.Id,
Normalized = f.Normalized
});
Order, skip, take
Ordering (by one column) is supported:
var res = db.Table<SomeData>()
.OrderBy(f => f.Normalized)
.Skip(10)
.Take(10)
.Select(f => new
{
Id = f.Id,
Normalized = f.Normalized
});
Ordering by Id
is the fastest.
There are also overloads of Select
and SelectEntity
that take object of type LinqdbSelectStatistics
. This object will be populated with the number of records that satisfied condition.
(Handy with .Skip
and .Take
when the total number is also needed). SearchedPercentile
shows how much SearchTimeLimited
has managed to search in given time.
Search
Linqdb supports simple full text search on string columns that end with ..Search (or .SearchS). For example,
var res = db.Table<SomeData>()
.Search(f => f.NameSearch, "some text")
.Select(f => new
{
Id = f.Id,
Name = f.NameSearch
});
will find rows where column NameSearch
contains both "some" and "text".
.Search
takes optional prameters start_step
and steps
which enable search on some part of documents.
This is handy when you have lots of data and want to return some results as fast as possible.
The search will only happen on slice of data, for example, start_step = 0, steps = 1 means only first 1000 documents will be searched, start_step = 10, steps = 5 - 5000 documents will be searched starting from document 10000.
You can get total number of steps using LastStep
function.
You can also limit search time by using SearchTimeLimited
- it will search as much as it can in given time (how much was searched is in LinqdbSelectStatistics object's SearchedPercentile property).
You can make search by the start of word by using SearchPartial
. In this case, however, you cannot search some data - the search will be performed on all rows.
If property name ends with ...SearchS the search index will only be using spaces to get words (as opposed to spaces and special characters). For example, string "hi@test me" would normaly satisfy search queries made of words: hi, test, me, hi@test. If, however, the property being searched ends with ...SearchS it will only satisfy queries containing these words: hi@test, me.
All operations on strings are case-insensitive.
Or
By default when statements like .Where
or .Search
go together they imply logical "and" to the results. It is possible to have "or" on neighbouring statements like so:
var res = db.Table<SomeData>()
.Search(f => f.Name, "some text").Or().Search(f => f.Name, "something else")
.Where(f => f.Id > 100)
.Select(f => new
{
Id = f.Id,
Name = f.Name
});
In this case only one of search need to satisfy to return the result. More than one .Or
could be used.
Intermediate results
All the work happens in .Select
or .SelectEntity
statement (generally - in the last statement), so things like these are possible:
var tmp = db.Table<SomeData>()
.Where(f => f.Normalized == 5);
if (person_id != null)
{
tmp.Where(f => f.PersonId == person_id); //no need to assign to tmp
}
var res = tmp.SelectEntity();
Count
To obtain table count:
db.Table<SomeData>().Count();
Also works when conditions applied:
db.Table<SomeData>().Where(f => f.Id < 50).Count();
GetIds
GetIds() method allows to get list of id's satisfying conditions without having to select them.
Save(item);
will update row with item’s Id
using item’s properties, given that Id
is not 0. If it is 0, new item will be created and new Id
will be assigned to the object’s Id
property.
To update column of multiple rows you would need to construct Dictionary<int, T>
where T
is column’s type and:
var dic = new Dictionary<int, int?>();
dic[2] = 8;
dic[3] = 11;
db.Table<SomeData>().Update(f => f.PeriodId, dic);
This will update column PeriodId
of rows with Ids 2 and 3 with respective values 8 and 11.
To delete row(s) you would need to construct HashSet<int>
of ids to be deleted and:
db.Table<SomeData>().Delete(new HashSet<int>() { 2, 3 });
To increment values atomically. i.e. thread-safe, use .AtomicIncrement
:
var new_item = new Counter() { Name = "unique_name", Value = 1 };
db.Table<Counter>().Where(f => f.Name == "unique_name").AtomicIncrement(f => f.Value, 1, new_item, null);
Linqdb supports transactions:
using (var transaction = new LinqdbTransaction())
{
var d = new SomeData()
{
Id = 1,
Normalized = 1.2,
PeriodId = 5
};
db.Table<SomeData>(transaction).Save(d); //note that .Table takes transaction as a parameter
var d2 = new BinaryData()
{
Id = 1,
Data = new List<byte>() { 1, 2, 3 }.ToArray()
};
db.Table<BinaryData>(transaction).Save(d2);
transaction.Commit(); //all writes happen here, if it fails - nothing gets modified (all or nothing)
}
if .Commit
is not called, nothing is modified. Transaction must be created used and destroyed in the same thread.
After .Save
(or .SaveBatch
) inside a transaction new items (with Id
== 0) are assigned new ids, so that they can be used before doing commit.
Transaction is supported on these data-modifying commands: .Save
, .SaveBatch
, .Update
, .Delete
.
Linqdb supports in-memory indexes on columns of type int
, DateTime
and double
. In-memory indexes speed up reading operations on such columns.
Also indexes are required for .GroupBy
statement (see below). To build an index on property use .CreatePropertyMemoryIndex
, to remove - .RemovePropertyMemoryIndex
Indexes slow down data-modifying operations, so use them only when you have to. If indexes (or searchable string properties) are present it is especially efficient to use batch operations, i.e. .SaveBatch and to make batches larger.
It is not recommended to use indexes for speeding operations anymore. They don't seem to speed up anything and there are better ways of improving performance: DistributedTable or PartitionedTable, or DistributedPartitionedTable. Indexes are left for GroupBy operations, see below.
Suppose you want to group by column PeriodId
and aggregate column Value
. For that in-memory index must be created:
db.Table<SomeData>().CreateGroupByMemoryIndex(f => f.PeriodId, f => f.Value);
Then you could do something like:
var res = db.Table<SomeData>()
.GroupBy(f => f.PeriodId)
.Select(f => new { f.Key, Sum = f.Sum(z => z.Value), Avg = f.Average(z => z.Value) });
These aggregation functions are supported: Count
, CountDistinct
, Sum
, Max
, Min
, Average
. Key
selects group-by property's value.
Linqdb supports working on generic types: db.Table<T>().
, if you can think of suitable use-case.
There is a certain limit on batch sizes in modification operations, so that they could be performed atomically (all or nothing) using ACID principles. If there is a need to modify large amount of data and ACID principles are not
important, there are convenience methods: SaveNonAtomically
, UpdateNonAtomically
, DeleteNonAtomically
. These cannot be used in a transaction.
Same goes to Select vs SelectNonAtomically. Select does reading using ACID principles with a specific data-snapshot. However, there is a limit of how much data can it read in one go. If ACID is not important
(like, for example, when no one is modifying data) SelectNonAtomically can read whatever amount of data in iterations using Select of smaller batches. So non-atomic versions are just convinience methods of iteratively calling their ACID-compliant counterparts.
GetServerStatus(int timeToWaitInMs = 1000) returns server's name (servername property in config.txt) and also returns no later than timeToWaitInMs ms. If server doesn't respond sooner it is considered down.
Simple, but fast in-memory queues are supported: db.Queue<T>().PutToQueue
adds item to queue T and db.Queue<T>().GetAllFromQueue
gets all queue's T items and removes them from queue.
Only one reader is quaranteed to get specific item in multi-threaded environment. Entities used in a queue must be marked with protobuf attributes.
Before using Linqdb one would need to create it:
var db = new Db("DATA"); //embedded version
var db = new Db("host_ip:port"); //server's client version
The argument is path to the database or ip to the server.
The db needs to be created once at the application startup.
Before exiting application it is a good practice to dispose the db:
db.Dispose();
To make a copy of a database programatically:
db.Replicate("PATH_TO_COPY");
If directory exists - it will be removed before copying. Database can still be read/written while replication is in progress.
Not tested on large database under heavy load.
Partitioned table is for the following situation: when you have a large table but perform operations only on some slices of it. For example, you have stock prices but only work with per company data. In such case partitioned tables offer great performance boost. Example of usage:
var result = db.PartitionedTable<PricePoint>("msft").SelectEntity();
PartitionedTable takes partition argument which in this case is symbol of a company. Under the hood partitioned table is actually separate table and as such records in Rocksdb start with same unique prefix.
Since those records are sorted they will be in same file or much smaller number of files compared to the case if table wouldn't be partitioned. This fact makes operations on them fast.
The downside of partitioned table is you have to make multiple calls to the db if you're working on several partitions. Perhaps for those operations data could be duplicated on a non-partitioned table.
Transactions are not supported on partitioned tables.
The idea is to have a table that is simillar to regular .Table but under the hood the data is distributed among N servers. This way you can scale your database using similar code. To create distributed database you just create N servers, construct usual Db objects and pass them to DistributedDb:
var list = new List<Db>()
{
new Db("ip:port", "user", "pass"),
new Db("ip:port", "user", "pass"),
new Db("ip:port", "user", "pass"),
new Db("ip:port", "user", "pass"),
new Db("ip:port", "user", "pass"),
new Db("ip:port", "user", "pass"),
};
var distributeddb = new DistributedDb(list.Select((f, i) => new { Db = f, index = i }).ToDictionary(f => f.index + 1, f => f.Db));
Now distributeddb has .DistributedTable which has operations simillar to .Table. Here are the differences:
Entity class now must have Id and Sid (server id) properties. This pair identifies a record and is called DistributedId. They are only going to be used in data modification operations: Delete and Update. To uniquely identify a record you would give it a Guid property without relying on Id and Sid.
No transactions are supported on distributed table. It is possible that data might be saved on one server and throw exception (not saved) on the other.
GroupBy statement returns dynamic
result. Less aggregation functions are supported: Sum, Count, Min, Max, Average
Some other notes:
data is distributed randomly and uniformly among servers.
if one server is down - the whole operation breaks.
DistributedPartitionedTable is also supported
Embedded Linqdb is one process database, that is only one process can access it at a time, even if it’s just for reading.
You cannot change type of a column. If you need that – create a new column with required type and copy data there. (Changing type from nullable to non-nullable and vice versa is ok.)
And after you modify type after it has been created - say added new column - you may want to manually fill it with whatever default value (i.e. if it is a nullable type – it won’t automatically satisfy == null after creation). In other words, type changes don’t automatically change data.
You can't select data from two or more tables in one data snapshot. More so, if you use SelectNonAtomically there is a risk that someone will modify data while select is executing and it will return a row that doesn't match the Where statement. If you need ACID operations, then you can achieve those within one table if you don't use methods that have NonAtomically in their names.
Try to keep Ids of table records sequential and starting from 1. This way search and other operations are most efficient.
There is currently no fail-safe mechanism. If server is down, database is down. If one of the servers of distributed db is down, the whole distributed db is down.
On Windows server folder contains Server.exe
which starts listening on port specified in config.txt
Server has configuration file config.txt
which is self-explanatory.
You could install server as Windows service using NSSM.
On linux docker image is available