multiple tables query

Jan 29, 2013 at 5:58 AM

Imagine a typical usage scenario: employee/dept

 

Employee: EmpID, DeptID, EmpName, GivenName, Surname, Birthday

Dept: DeptID, DeptName, DeptTypeID

 

Scenario 1. Fetch employees where DeptID = 3 and Birthday between 1980-01-01 and 1990-01-01?

Scenario 2. Fetch employees (including DeptName)  where DeptTypeID = 3 and DeptName like '%Sale%'

 

I believe we can use LINQ for this scenarios, but I am not sure about the performance of JOINs and LIKEs.

Coordinator
Jan 29, 2013 at 9:38 AM
Edited Jan 29, 2013 at 1:30 PM

Only sharpened methods like transaction.SelectForwardStartFrom, StartWith (etc... from transaction class) will work fast, the pure LINQ (where(r=>r.Key == 12)) will need iterate via the whole table.

As a first step, I recommend to read somewhere about NoSql concept at all, to understand how to solve such problems.

In short,

- Every table in DBreeze is a Key/Value, so you can effectively search only by keys.

If you have complex object and you are going to search this object in different ways, you must prepare as many Key/Value tables (indexing tables) as necessary to fulfill your request. Sure, time for developing of the DAL layer increases, but system becomes really tuned and can answer your request as fast as possible.

In DBreeze, we have an ability to get Key/Value pair by physical link, so, all secondary index tables must contain only the link to the primary table (containing complex object).

Examples (for all people, who are interested):

Scenario1:

In this scenario you are going to query DeptID = 3 and then employees birthdays.

Options:

1. This query must be run 100 times per day and in total we have only 1000 empoyes -> Don't mess with it, just iterate via the whole table, comparing every element and returning it.

2. This query must be run 10 times per second and in total we have 10000 employes -> make a copy of the table in the memory and use Dictionary to search necessary elements. Every time when you insert/delete/update user, synchronize the entity with in-memory storage and database.

3. This query must be run quite often and we have 10000000 of employes ->

     SubOption1: may be still to have it in the memory ?

     SubOption2: we want to use DBreeze for search:

          We start to analyze who, how often and why must query.... and we found out that the most time in our queries we must use DepartmentId.

           First of all, we create main table where we store our users. Key can be long (UserId), Value - serialized User's entity. After insert we got link to this key/value pair and we can use this link to form our secondary index tables.

           We have created one table, now it's time to create secondary tables with the names "UsersInDepartment1","UsersInDepartment2","UsersInDepartment3".... (or shorted form) -> "ud1","ud2","ud3"......"udN"

           In every of such tables we can have list of users which belong to every of such department. So, in table UsersInDepartment1(ud1) we will have users from this department. Key will be userId, Value will be byte[] (8 bytes) - link to the primary user table, so we can use transaction.SelectDirect, in the future.

           It means, if someone will ask us to find some kind of a user from the Department 4, we can first query table "UsersInDepartment4"(ud4) to get all users from this department.

          We can create another type of a secondary index - compound key:    departmentID+birthDate.

           Dbreeze.Utils contains functions for converting different dataTypes to sortable byte[] - (standard .NET functions don't work).

          About this technique read in the documentation.

           Main table and all its secondary index tables can be stored physically in one file, using "Nested Tables Technique" described in documentation.

Jan 30, 2013 at 9:21 AM

You are right, I really need to read more about the practices of NoSQL. I am surprised to see such solution in NoSQL, which we will not do in RDBMS.

But doesn't it cause excessive tables (thousands of depts lead to thousands of deptxxxx tables), and it's only for ONE business logic?

Coordinator
Jan 30, 2013 at 9:30 AM

Actually, in RDBMS there is no magic, if you have index (exactly the same as here) - your look ups are fast. No index - traversing the whole table makes it slow.

If you don't want to have  thousands of tables you must use compound index - then you will have only 1 table instead of thousands, Technique is described in documentation. The same way compound indices are created in RDBMS.

Feb 4, 2013 at 7:02 AM
Regarding compound key, there are typo issues in the online documentation (google docs) as well as in the pdf file.

I finally managed to compile the code:
var a = new Article { Id = 999, Name = "abc", Price = 15 };
            using (var engine = new DBreezeEngine(@"C:\Temp\DBreeze2"))
            {
                byte[] ptr = null;
                using (var tran = engine.GetTransaction())
                {
                    //Inserting into Primary Table
                    tran.Insert<long, byte[]>
                        ("Article",
                        a.Id,                        //Id - primary key
                        a.Name.To_FixedSizeColumn(50, false)                //let it be not DataBlock
                        .Concat(
                        a.Price.To_4_bytes_array_BigEndian()
                        ),
                        out ptr                                //getting back a physical pointer
                        );
                    //Inserting into Secondary Index table

                    tran.Insert<byte[], byte[]>
                        ("ArticleIndexPrice",
                        a.Price.To_4_bytes_array_BigEndian() //compound key: price+Id
                        .Concat(
                        a.Id.To_8_bytes_array_BigEndian()),
                        ptr //value is a pointer to the primary table
                        );
                    tran.Commit();

                    var key = (ulong)a.Price.To_4_bytes_array_BigEndian().Concat(a.Id.To_8_bytes_array_BigEndian()).To_Int64_BigEndian();
                    var data = tran.Select<ulong, byte[]>("Article", key);
                    var actualCount = 0;
                    Console.WriteLine(data.Value);
                    //foreach (var item in data)
                    {
                        //Console.WriteLine(item.Key + "," + item.Value.Length);

                        actualCount++;
                    }

                    Console.WriteLine("count: " + actualCount);
                }
            }
I think I cannot get the "key" correct to fetch it back.
Coordinator
Feb 4, 2013 at 8:29 AM
Edited Feb 4, 2013 at 8:35 AM
var data = tran.Select<ulong, byte[]>("Article", key);

your key is not ULONG, but byte[]

Try:
var compound_key = a.Price.To_4_bytes_array_BigEndian().Concat(a.Id.To_8_bytes_array_BigEndian());
var row = tran.Select<byte[], byte[]>("Article", compound_key);

if(row.Exists)
float xPrice = row.Key.Substring(0,4).TO_float....

float xId = row.Key.Substring(4,8).TO_UINT64.....

price+Id will give you byte[] with the length 12: 4 bytes - price and 8 bytes - Id
Coordinator
Feb 4, 2013 at 8:53 AM
Example with DepartmentId and UserId.

We want to create compound Key DepartmentId+UserId. Having this we can quickly find out which users belong to necessary department.


using DBreeze;
using DBreeze.Utils;

long userId = 0;
        using (var tran = engine.GetTransaction())
        {
            //INSERTING USERS IN DEPARTMENT  (j is DepartmentId, i is UserId)

            for (long j = 1; j <= 3; j++)
            {
                for (int i = 0; i < 3; i++)
                {
                    userId++;

                    //Forming Key DepartmentId+UserId:
                    var compoundKey = j.To_8_bytes_array_BigEndian().Concat(userId.To_8_bytes_array_BigEndian());
                    //Inserting 
                    tran.Insert<byte[], string>("t", compoundKey, "test");
                }
            }
            tran.Commit();
        }
//Now let's look what is inside
        using (var tran = engine.GetTransaction())
        {
            foreach (var row in tran.SelectForward<byte[], string>("t"))
            {
                Console.WriteLine("Key: {0}; V: {1}", row.Key.ToBytesString(""), row.Value);
            }
        }
Key: 80-00-00-00-00-00-00-01-80-00-00-00-00-00-00-01; V: test
Key: 80-00-00-00-00-00-00-01-80-00-00-00-00-00-00-02; V: test
Key: 80-00-00-00-00-00-00-01-80-00-00-00-00-00-00-03; V: test
Key: 80-00-00-00-00-00-00-02-80-00-00-00-00-00-00-04; V: test
Key: 80-00-00-00-00-00-00-02-80-00-00-00-00-00-00-05; V: test
Key: 80-00-00-00-00-00-00-02-80-00-00-00-00-00-00-06; V: test
Key: 80-00-00-00-00-00-00-03-80-00-00-00-00-00-00-07; V: test
Key: 80-00-00-00-00-00-00-03-80-00-00-00-00-00-00-08; V: test
Key: 80-00-00-00-00-00-00-03-80-00-00-00-00-00-00-09; V: test

//Let'S fetch all users which belong to DepartmentId = 2
        Console.WriteLine("******");

        using (var tran = engine.GetTransaction())
        {
            long searchDepartment = 2;

            foreach (var row in tran.SelectForwardStartsWith<byte[], string>("t", searchDepartment.To_8_bytes_array_BigEndian()))
            {
                Console.WriteLine("Key: {0}; V: {1}; UserId: {2}", row.Key.ToBytesString(""), row.Value, row.Key.Substring(8,8).To_Int64_BigEndian());                    
            }
        }

Key: 80-00-00-00-00-00-00-02-80-00-00-00-00-00-00-04; V: test; UserId: 4
Key: 80-00-00-00-00-00-00-02-80-00-00-00-00-00-00-05; V: test; UserId: 5
Key: 80-00-00-00-00-00-00-02-80-00-00-00-00-00-00-06; V: test; UserId: 6
Feb 4, 2013 at 9:40 AM
Edited Feb 4, 2013 at 10:26 AM
your sample for dept/user works like a charm! I like the idea of compound key over table-based dept-users solution.

two more questions about the compund key:
  1. it looks that through the extesion methods we can convert the other data types like Integer/Boolean/String/DateTime etc to byte[] and then concat them to form the final byte[] key.
  2. combination of compound key and range (like DateTime/Long etc): I know there is SelectForwardFromTo which will solve the range requirement, but at the same time I want to apply compound key like deptId etc. How?
I am writing a big and ambitious open source system which might produce millions of records a day (depending on actual usage), I am looking for a default data storage (user can choose RDBMS). Aftering testing, I think NoSQL like dbreeze is a better option. But in a real world business system we cannot avoid relationship, which brought me recent qustions in this project.

besides the relationship (compound key), I still would like to have a transparent serialization (not Json, Not XML, not .net binary because all these 3 are SLOW and unnecessarily large because of meta info) like the one I wrote in the very first post in this thread. User only need to pass a POCO to insert, and fetch back as POCO, without explicit conversion.

finally, I think referring System.Web for .net built-in json serialization is not a good idea, because it forces user to add the reference, and could not use client profile, although I understand your goal is to have no dependcies.
Coordinator
Feb 4, 2013 at 10:22 AM
From DBreeze point of view, what you are talking about is the other, upper, layer. You can create this object layer, with serialization, automatic secondary index tables creation etc... and then may be we can integrate it into the sources or as an extra "Object Dbreeze" DLL.
If you have questions or if you find issues, please, don't hesitate to write them down here.
Coordinator
Feb 4, 2013 at 10:24 AM
Edited Feb 4, 2013 at 10:25 AM
more question about the compound key: can we do the same thing for other data types like Boolean/String/DateTime etc?

of course....on the low level all data types are byte[] and in Dbreeze.Utils, we have lots of functions to convert all standard dataTypes into sortable byte[] and back.
Coordinator
Feb 4, 2013 at 10:58 AM
combination of compound key and range (like DateTime/Long etc): I know there is SelectForwardFromTo which will solve the range requirement, but at the same time I want to apply compound key like deptId etc. How?

Probably by table name dimension:
"ud3" - table who stores user's birthdays in DepartmentId = 3
Key: <byte[],byte[]> - compound key UserBirthday(DateTime 8 byte)+UserId(long 8byte - to give uniqueness)
Value: <byte[]> - pointer to the userTable.
Coordinator
Feb 4, 2013 at 11:06 AM
Edited Feb 4, 2013 at 12:52 PM
DeptID = 3 and Birthday between 1980-01-01 and 1990-01-01?

Also possible like this:

Compound Index can be DepartmentId+Birthday+UserId => (long 8 byte)+(DateTime 8 bytes)+UserId (8 bytes)

byte[] keyForSearch = ((long)3).To_8_bytes_array_BigEndian().ConcatMany(new DateTime(1980-01-01).To_8_bytes_array_BigEndian(),long.MinValue.To_8_bytes_array_BigEndian())


foreach (var row in tran.SelectForwardStartFrom<byte[], string>("t", keyForSearch, true ))
        {
var deptId = row.Substring(0,8).To_Int64_BigEndian();
//Checking Department Id should be still 3
if(deptId!=3)
break;
//Check birthday should not overexceed searching value
var birthday = row.Substring(8,8).To_DateTime();
if(birthday>=new DateTime(1990-01-01))

break;

You can feel the power of this approach when you have thousands or millions of the data. When you have only 100 or so - just iterate via the whole table - you will not even notice it (from the speed point of view).
Feb 4, 2013 at 12:31 PM
hhblaze wrote:
From DBreeze point of view, what you are talking about is the other, upper, layer. You can create this object layer, with serialization, automatic secondary index tables creation etc... and then may be we can integrate it into the sources or as an extra "Object Dbreeze" DLL.
If you have questions or if you find issues, please, don't hesitate to write them down here.
Right, there should be such layer, however, it would be better to be in-built. I will put dbreeze into real usage first and possibly there will be more questions or even issues (no system is bug free, at least potential improvement, right?).
Coordinator
Feb 4, 2013 at 12:53 PM
In the last example
(DeptID = 3 and Birthday between 1980-01-01 and 1990-01-01?)
had to be SelectForwardStartFrom, instead of StartsWith.
Feb 4, 2013 at 1:09 PM
Edited Feb 4, 2013 at 1:20 PM
Ok, I think because you just hand write the sample code so there are quite a few errors and I finally fix them and now it works. As you can see, with customer serializer, it runs faster and code is cleaner (without those to_xxx_yyy extension methods).

I was wondering why you define the search key like this:

byte[] keyForSearch = ((long)3).To_8_bytes_array_BigEndian().ConcatMany(new DateTime(1980-01-01).To_8_bytes_array_BigEndian(),long.MinValue.To_8_bytes_array_BigEndian())

in doing so, I cannot find a match, because the last part (user id does not match).

btw, the idea is you try to match dept and start date of birth, and loop through to find end date of birth, so theorectically the matching records after the compound key will be significantly decreased?

there might be a performance concern:
  1. when DOB is unique within the dept by DateTime.Parse("1980-01-01").AddDays(i), adding 1M will just take a few seconds
  2. when DOB is highly duplicated by DateTime.Parse("1980-01-01").AddDays(i % 2), it takes 70 seconds to add 1M records.
however, fetching all matching records is blazing fast!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DBreeze;
using DBreeze.Utils;
using System.IO;

namespace ConsoleApplication29
{
    public class Emp
    {
        public string Name { get; set; }
        public ulong EmpID { get; set; }
        public ulong DeptID { get; set; }
        public DateTime DOB { get; set; }
        public ulong Long { get; set; }
        public double Double { get; set; }
    }


    public static class Extensions
    {
        public static byte[] Serialize(this Emp Item)
        {
            byte[] content;
            using (var buffer = new MemoryStream())
            {
                using (var writer = new BinaryWriter(buffer))
                {
                    writer.Write(Item.EmpID);
                    writer.Write(Item.DeptID);
                    writer.Write(Item.Double);
                    writer.Write(Item.Long);
                    writer.Write(Item.Name);
                    writer.Write(Item.DOB.ToString());
                    writer.Close();
                }
                content = buffer.ToArray();
            }
            return content;
        }

        public static Emp Deserialize(this byte[] Content)
        {
            Emp item = new Emp();
            using (var buffer = new MemoryStream(Content))
            {
                using (var reader = new BinaryReader(buffer))
                {
                    item.EmpID = reader.ReadUInt64();
                    item.DeptID = reader.ReadUInt64();
                    item.Double = reader.ReadDouble();
                    item.Long = reader.ReadUInt64();
                    item.Name = reader.ReadString();
                    item.DOB = DateTime.Parse(reader.ReadString());
                    reader.Close();
                }
            }
            return item;
        }
    }
    class Class1
    {
        static void Main(string[] args)
        {
            ulong userId = 0;
            using (var engine = new DBreezeEngine(@"E:\Temp\DBreeze2"))
            {
                using (var tran = engine.GetTransaction())
                {
                    //INSERTING USERS IN DEPARTMENT  (j is DepartmentId, i is UserId)

                    for (ulong j = 1; j <= 3; j++)
                    {
                        for (int i = 0; i < 3; i++)
                        {
                            userId++;

                            var emp = new Emp { EmpID = userId, DeptID = j, Double = i * 1.1, Long = (ulong)(i * 100), DOB = DateTime.Parse("1980-01-01").AddDays(i), Name = "User" + userId };

                            //Forming Key DepartmentId+UserId:
                            var compoundKey = j.To_8_bytes_array_BigEndian().ConcatMany(emp.DOB.To_8_bytes_array(), userId.To_8_bytes_array_BigEndian());
                            //Inserting 
                            tran.Insert<byte[], byte[]>("t", compoundKey, emp.Serialize());
                        }
                    }
                    tran.Commit();
                }

                using (var tran = engine.GetTransaction())
                {
                    ulong searchDepartment = 2;
                    byte[] keyForSearch = searchDepartment.To_8_bytes_array_BigEndian().ConcatMany(DateTime.Parse("1980-01-01").To_8_bytes_array());

                    foreach (var row in tran.SelectForwardStartsWith<byte[], byte[]>("t", keyForSearch))
                    {
                        var emp = row.Value.Deserialize();
                        //Checking Department Id should be still 2
                        if (emp.DeptID != searchDepartment)
                            break;
                        //Check birthday should not overexceed searching value
                        if (emp.DOB > DateTime.Parse("1980-01-01"))
                            break;
                        Console.WriteLine("Key: {0}; DOB: {1}; UserId: {2}", row.Key.ToBytesString(""), emp.DOB, emp.EmpID);
                    }
                }
            }
            Console.WriteLine("done");
            Console.Read();
        }
    }
}
Coordinator
Feb 4, 2013 at 1:13 PM
So, will be better
 long deptId = 1;
             DateTime birthday = new DateTime(1980, 5, 15);
             long userId = 0;

            using (var tran = engine.GetTransaction())
            {

                tran.Insert<byte[],byte>("t",deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(),(userId).To_8_bytes_array_BigEndian()),1);

                userId++;
                birthday = new DateTime(1980, 5, 16);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 17);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);

                deptId = 2;

                userId++;
                birthday = new DateTime(1980, 5, 15);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 16);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 17);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 18);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 23);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);


                deptId = 3;

                userId++;
                birthday = new DateTime(1980, 5, 15);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 16);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);
                userId++;
                birthday = new DateTime(1980, 5, 17);
                tran.Insert<byte[], byte>("t", deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(), (userId).To_8_bytes_array_BigEndian()), 1);

                tran.Commit();
                
            }

            //Getting data. Birthdays of all users starting from (1980, 5, 16) - (1980, 5, 18) from the 2 department
            using (var tran = engine.GetTransaction())
            {
                deptId = 2;
                birthday = new DateTime(1980, 5, 16);

                byte[] searchKey = deptId.To_8_bytes_array_BigEndian().ConcatMany(birthday.To_8_bytes_array(),long.MinValue.To_8_bytes_array_BigEndian());
                //To get just starting from 
                //byte[] stopKey = deptId.To_8_bytes_array_BigEndian().ConcatMany((new DateTime(9999,1,1)).To_8_bytes_array(), long.MaxValue.To_8_bytes_array_BigEndian());
                byte[] stopKey = deptId.To_8_bytes_array_BigEndian().ConcatMany((new DateTime(1980, 5, 18)).To_8_bytes_array(), long.MaxValue.To_8_bytes_array_BigEndian());

                //foreach (var row in tran.SelectForwardStartFrom<byte[], byte>("t", searchKey, true))
                foreach (var row in tran.SelectForwardFromTo<byte[], byte>("t", searchKey, true, stopKey,true))
                {                   
                    Console.WriteLine("Key: {0}; Dept: {1}; Birthdate: {2}; userID: {3}", row.Key.ToBytesString("")
                        ,row.Key.Substring(0,8).To_Int64_BigEndian()
                        ,row.Key.Substring(8, 8).To_DateTime().ToString("dd.MM.yyyy")
                        ,row.Key.Substring(16,8).To_Int64_BigEndian()                        
                        );
                }
            }
Feb 4, 2013 at 1:23 PM
Edited Feb 4, 2013 at 1:32 PM
I raised some concerns about peformance in my last post.
Feb 4, 2013 at 1:32 PM
Amazing! I really like this solution!

I have other investigations and I will use another thread.


hhblaze wrote:
So, will be better
Coordinator
Feb 4, 2013 at 1:54 PM
unruledboy wrote:
I raised some concerns about peformance in my last post.
DBreeze is designed to be fast in bulk inserts, when keys are supplied sorted in ascending order, so collect chunks in memory before inserts.