composite key usage

May 9, 2013 at 6:31 AM
composite key no doubt is powerful, I have a typical class:
    public enum SomeTypes
    {
        Low = 1,
        Middle = 2,
        High = 3
    }

    public class BaseObject
    {
        public ulong ID { get; set; }
        public DateTime CreatedDate { get; set; }
        public string Name { get; set; }
    }

    public class Company : BaseObject
    {
        public string Logo { get; set; }
    }

    public class Category : BaseObject
    {
        public string Something { get; set; }
    }
    
    public class BizObject : BaseObject
    {
        public ulong CompanyID { get; set; }
        public ulong CategoryID { get; set; }
        public SomeTypes Some { get; set; }
        public int Value { get; set; }
    }
and the key=CompanyID + CategoryID + CreatedDate + Some + ID
            using (var engine = new DBreezeEngine(@"C:\temp\DBR1"))
            {
                using (var tran = engine.GetTransaction())
                {
                    try
                    {
                        var max = tran.Max<byte[], BizObject>(ObjectName);
                        var biz = new BizObject { CompanyID = 1, CategoryID = 2, CreatedDate = DateTime.Now, Name = "First", Value = 123.45 };
                        if (max.Exists)
                            biz.ID = max.Value.ID;
                        biz.ID++;
                        var key = biz.CompanyID.To_8_bytes_array_BigEndian().ConcatMany(biz.CategoryID.To_8_bytes_array_BigEndian(), ((uint)biz.Some).To_4_bytes_array_BigEndian(), biz.CreatedDate.To_8_bytes_array(), biz.ID.To_8_bytes_array_BigEndian());
                        tran.Insert<ulong, BizObject>(ObjectName, o.ID, biz);
                        tran.Commit();
                    }
                    catch (Exception)
                    {
                    }
                }
            }
The reason I choose key like that is it could quickly fetch matching records for a specific CompanyID/CategoryID etc.

Here comes a problem, the biz object needs to associate with other business objects, which I normally associate with ID (ulong), if so, how can I know the CreatedDate and Some even though I know ID, and CompanyID/CategoryID(just assume).

If not, which means I need to associate with the full composite key, or, have a separte table containing the ID as key, and the composite key as value.

Do you have any thoughts about it?
Coordinator
May 9, 2013 at 2:02 PM
Edited May 9, 2013 at 2:03 PM
I like approach of master table with standard growing up index
1 obj1
2 obj2
3 obj3

after inserting every of this elemens you get pointer to it (refId, look on overloads of the modification operations, like Insert)

then you create as many secondary tables as you need:
  • This table will give you ability to sort all objects by creation date
    Key Value
    creationDate+ID+referenceToMasterTable null
  • This table will give you ability to sort (using Forward/Backward StartFrom or StartsWith) by company and creation data
    Key Value
    -companyId+creationDate+referenceToMasterTable null
  • etc...up to needs and imagination.
Then search by secondary index and SelectDirect by referenceToMasterTable from MasterTable to get object self.
May 9, 2013 at 10:38 PM
Edited May 9, 2013 at 10:39 PM
Blaze Yes that's nice, but I find one thing troublesome about the "DIRECT" methods. If record stored changes to which PTR's are stored in secondary indexes, etc... You have to regenerate all those indexes for that record. For example if Technical_SetTable_OverwriteIsNotAllowed is on or if record stored needs more capacity. Now this backtracking can be difficult.
Coordinator
May 10, 2013 at 12:09 AM
Edited May 10, 2013 at 8:18 AM
What is difficult? Question is possible or not.

actually the sceleton of the saving object can be like this:
  • find out if object with such index exists in master table
  • if yes->get it, remove all corresponding keys in all secondary index tables of such master table. Or update it (all depends what is held in key and value).
  • insert object into master table (with the same or newly generated id for update/insert), get new refPtr for SelectDirect
  • fill all secondary index table also with newly inserted refPtr
  • commit transaction

Also if you update only one or some (less then 100) object(s) per round, there is no sense to use Technical_SetTable_OverwriteIsNotAllowed - it will also work fast and will economize the space.
May 13, 2013 at 1:19 AM
Ok, for your suggestion, every new object will do (1 + N) writes, so if 1MLN new objects = (1 + N) MLN writes, is the trade off worthy?
Coordinator
May 13, 2013 at 8:39 AM
test your scenarios having DBreeze tool and find the best for you approach.
May 30, 2013 at 12:51 PM
You said:
This table will give you ability to sort (using Forward/Backward StartFrom or StartsWith) by company and creation data
Key Value
-companyId+creationDate+referenceToMasterTable null
It means each record inserted in the above table, the index will be sorted (companyId is not like auto growth creationDate), it's random insert, and as data grow, performance will degrade, right?
Coordinator
May 30, 2013 at 3:30 PM
Edited May 30, 2013 at 9:38 PM
Why should it degrade?
DBreeze has inside LianaTrie it's variation of radix Trie.
After insert there is no sorting procedure - element is already in correct place.
May 30, 2013 at 5:27 PM
I am writing here, cause it is about composite keys:

Blaze, If You have a table, with composite keys. Is there any possiblity to have some sort of .Count () implemented,
that would count only keys with certain prefix ( .CountPrefix (byte[]) method ) ? Is it even possible, that is without
iteration through all elements ?
Coordinator
May 30, 2013 at 8:20 PM
Edited May 30, 2013 at 10:06 PM
Now DBreeze must iterate through elements using StartsWith(given prefix).Count() to calculate your desired count.
If u need really fast statistic u have to calculate it in parallel with inserts/updates/removes and store in the same transaction in statistic table.
Coordinator
May 30, 2013 at 8:25 PM
Edited May 30, 2013 at 8:27 PM
BTW There is new ver. 50 with new IO layer.
  • Technical_SetTable_OverwriteIsNotAllowed works like crazy.
  • Pure updates (without Technical_SetTable_OverwriteIsNotAllowed) also work much faster, for 1 second it's possible to overwrite up to 3-5K of values (with Technical_SetTable_OverwriteIsNotAllowed you got it for some milliseconds thou).
Please test it.
Coordinator
May 30, 2013 at 10:11 PM
unruledboy wrote:
It means each record inserted in the above table, the index will be sorted (companyId is not like auto growth creationDate), it's random insert, and as data grow, performance will degrade, right?
DBreeze has inside LianaTrie it's variation of radix Trie.
After insert there is no sorting procedure - element is already in correct place.
May 31, 2013 at 5:46 AM
Edited May 31, 2013 at 5:47 AM
Btw blaze, rollback file sometimes accumulates in size any ideas ?
May 31, 2013 at 6:51 AM
krome wrote:
Btw blaze, rollback file sometimes accumulates in size any ideas ?
I noticed that when I try to insert like 5000 records, the data file is completed with write, but the .rol (rollback?) is kept writing (file gets bigger and bigger) for a long time, and the whole transaction finishes. Which means most of the time spends on rollback operations?
May 31, 2013 at 6:54 AM
hhblaze wrote:
Why should it degrade?
DBreeze has inside LianaTrie it's variation of radix Trie.
After insert there is no sorting procedure - element is already in correct place.
I mean by the time it inserts. I believe the insert is faster for a new table, but when the table has like 1 million records, and because the index I mention is (companyId + CreatedTime), so if the companyId is not the largest one, it will be inserted somewhere in between (random insert), and the speed will be slower than a blank table, right?
Coordinator
May 31, 2013 at 7:44 AM
Edited May 31, 2013 at 9:32 AM
unruledboy wrote:

> I mean by the time it inserts. I believe the insert is faster for a new table, but when the table has like 1 million records, and because the index I mention is (companyId + CreatedTime), so if the companyId is not the largest one, it will be inserted somewhere in between (random insert), and the speed will be slower than a blank table, right?

OK, if your data accumulation strategy is to insert small batches in between - the speed will always remain the same before and after 1 MLN of records.

For the bigger batches I made for you special tests. Look carefully, digits will explain everything.

Procedure 1. In this procedure we insert complex index (sure all data is sorted in memory ASC). Dispersion is not very high but quite good (you can play with it).
 private void testF_002()
        {
            Console.WriteLine("start");
            DBreeze.Diagnostic.SpeedStatistic.ToConsole = true;
            DBreeze.Diagnostic.SpeedStatistic.StartCounter("a");

            using (var tran = engine.GetTransaction())
            {
                DateTime dt = DateTime.MinValue;
                byte[] btKey = null;
                long companyId = 580;
                Random rnd = new Random();

                for (int i = 0; i < 1000000; i++)
                {

                    dt = dt.AddTicks(200);

                    btKey = companyId.To_8_bytes_array_BigEndian().Concat(dt.Ticks.To_8_bytes_array_BigEndian());
                    tran.Insert<byte[], byte>("t1", btKey, 1);
                }


                tran.Commit();
            }

            DBreeze.Diagnostic.SpeedStatistic.PrintOut("a", true);

        }
our key is 16 bytes = companyId + datTime.ticks. Between every record there is 200 ticks difference, something like this: 580200 580400 580600 ....

This procedure executes 9 sec and resides 33MB of HDD space

Procedure2. In this procedure we will insert in between existing values other values.
private void testF_002_1()
        {
            Console.WriteLine("start");
            DBreeze.Diagnostic.SpeedStatistic.ToConsole = true;
            DBreeze.Diagnostic.SpeedStatistic.StartCounter("a");

            using (var tran = engine.GetTransaction())
            {
                DateTime dt = DateTime.MinValue;
                byte[] btKey = null;
                long companyId = 580;
                Random rnd = new Random();
                long ticks = 0;

                //Regulate with or without
                tran.Technical_SetTable_OverwriteIsNotAllowed("t1");

                for (int i = 0; i < 1000000; i++)
                {

                    dt = dt.AddTicks(200);
                    //was inserted 200 400 600 800 1000 1200 ...
                    //now we isert in between:  200 (now we insert 320) 400 (470) 600  (690) 800 (817) 1000 (118) 1200 ....
                    ticks = dt.AddTicks((-1) * (rnd.Next(150) + 1)).Ticks;
                    btKey = companyId.To_8_bytes_array_BigEndian().Concat(ticks.To_8_bytes_array_BigEndian());
                    tran.Insert<byte[], byte>("t1", btKey, 1);
                }


                tran.Commit();
            }

            DBreeze.Diagnostic.SpeedStatistic.PrintOut("a", true);

        }
Now data will look like: 580200 (580215) 580400 (580489) 580600 (580702) 580800 (580901) 581000 ...

There is a line in procedure - tran.Technical_SetTable_OverwriteIsNotAllowed("t1");

If we run with this line enabled: speed 18 sec size 80MB
If we run with this line disasbled: speed 86 sec size 78MB


Procedure 3. We count quantity of received data
 private void testF_002_2()
        {
             Console.WriteLine("start");
              DBreeze.Diagnostic.SpeedStatistic.ToConsole = true;
            DBreeze.Diagnostic.SpeedStatistic.StartCounter("a");
            using (var tran = engine.GetTransaction())
            {
                int cnt = 0;
                foreach (var row in tran.SelectBackward<byte[], byte>("t1"))
                {
                    cnt++;
                }

                Console.WriteLine("C1: {0}; C2: {1}", tran.Count("t1"), cnt);
            }
            DBreeze.Diagnostic.SpeedStatistic.PrintOut("a", true);
        }
Result is 2000000 records are inside
Iteration took 15 seconds.


I hope it will help you.
Coordinator
May 31, 2013 at 8:17 AM
unruledboy wrote:
krome wrote:
Btw blaze, rollback file sometimes accumulates in size any ideas ?
I noticed that when I try to insert like 5000 records, the data file is completed with write, but the .rol (rollback?) is kept writing (file gets bigger and bigger) for a long time, and the whole transaction finishes. Which means most of the time spends on rollback operations?
DBreeze Version 50 or less?
If 50 - send me example, if less - no sense - use 50.
May 31, 2013 at 1:00 PM
@hhblaze

I run the 3 tests from you, the result is more or less same as yours:

procedure 1: 15 seconds
procedure 2: 15 seconds
procedure 3: 22 seconds (Why read is slower than write?)

And, when I run the procedure 2 for a second time, it takes 39 seconds (with Technical_SetTable_OverwriteIsNotAllowed).

And then run the procedure 3 for a second time (afer I run procedure 2 for the second time), it takes 53 seconds.
Coordinator
May 31, 2013 at 1:14 PM
Edited May 31, 2013 at 1:17 PM
Read is sometimes slower then write, it's possible, when the write is very fast :)

When I run proc2 second time I receive in the table not 2.000.000 of records, but approximately 3 MLN, that's why procedure 3 executes longer (in my case only 24 seconds and after pressing proc2 third time, and getting around 4MLN of records, prc3 executes 35 seconds).

BWT running proc2 second and third time takes by me around 22 seconds each. (this is already not the same PC on which I've executed first tests, but a bit slower)
May 31, 2013 at 2:09 PM
Less than 50. I haven't tested 50 yet.
Jun 1, 2013 at 5:39 AM
hhblaze wrote:
unruledboy wrote:
krome wrote:
Btw blaze, rollback file sometimes accumulates in size any ideas ?
I noticed that when I try to insert like 5000 records, the data file is completed with write, but the .rol (rollback?) is kept writing (file gets bigger and bigger) for a long time, and the whole transaction finishes. Which means most of the time spends on rollback operations?
DBreeze Version 50 or less?
If 50 - send me example, if less - no sense - use 50.
It's lass than 50.

In 50, it's working fine.
Jun 1, 2013 at 5:44 AM
Edited Jun 1, 2013 at 5:47 AM
With version 50, I run the following code (RELEASE, not debug), and there is no other heavy disk processing in the system.

100K records, it runs for 87 seconds. Hardware is a cheap laptop with 5400RPM, 2 Cores, 4GB RAM.

Is it because the complex object model or lack of efficient for DbMJSON that causes the insert speed to be only around 1K/s?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using DBreeze;
using DBreeze.Transactions;
using DBreeze.Utils;
using DBreeze.DataTypes;

namespace DBreeze
{
    public enum SomeTypes
    {
        Low = 1,
        Middle = 2,
        High = 3
    }

    public class BaseObject
    {
        public ulong ID { get; set; }
        public DateTime CreatedDate { get; set; }
        public string Name { get; set; }
    }

    public class Company : BaseObject
    {
        public string Logo { get; set; }
    }

    public class Category : BaseObject
    {
        public string Something { get; set; }
    }

    public class BizObject : BaseObject
    {
        public ulong CompanyID { get; set; }
        public ulong CategoryID { get; set; }
        public SomeTypes Some { get; set; }
        public int Value { get; set; }
    }

    class Test
    {
        static void Main(string[] args)
        {
            Generate();
            Console.Read();
        }

        private static void Generate()
        {
            DBreeze.Diagnostic.SpeedStatistic.ToConsole = true;
            DBreeze.Diagnostic.SpeedStatistic.StartCounter("a");
            using (var engine = new DBreezeEngine(@"C:\Temp\Test"))
            {
                using (var tran = engine.GetTransaction())
                {
                    tran.SynchronizeTables("BizResult", "BizResultIndex");
                    for (int i = 0; i < 100000; i++)
                    {
                        var r = new BizObject { CompanyID = (ulong)(i % 5), CategoryID = 2, CreatedDate = DateTime.Now.AddSeconds(i), Name = "First", Value = i };
                        var row = tran.Max<ulong, DbMJSON<BizObject>>("BizResult");
                        ulong id = 0;
                        if (row.Exists)
                            id = row.Value.Get.ID;
                        id++;
                        r.ID = id;
                        tran.Insert<ulong, DbMJSON<BizObject>>("BizResult", r.ID, r);

                        var key = r.CompanyID.To_8_bytes_array_BigEndian().Concat(r.CreatedDate.To_8_bytes_array());
                        tran.Insert<byte[], byte[]>("BizResultIndex", key, null);
                    }
                    tran.Commit();
                }
            }
            DBreeze.Diagnostic.SpeedStatistic.PrintOut("a", true);
        }

    }
}
Coordinator
Jun 1, 2013 at 12:23 PM
Edited Jun 1, 2013 at 12:26 PM
unruledboy

There are 2 problems in the code.
  • In Every iteration you try to get maximal value:
var row = tran.Max<ulong, DbMJSON<BizObject>>("BizResult");

afterwards you make Insert

tran.Insert<ulong, DbMJSON<BizObject>>("BizResult", r.ID, r);

The fact is that in such case you use the same Trie instance
for both operations and trie every time must re-adjust itself.

That's why in pattern Select->Insert we use Select with readVisibilityScope equals to true.

It creates second trie instance. Select and Insert use different instances. Speed increases in order. For Max and Min we don't have readVisibilityScope because it's a nonsense, inside of one procedure. It's enough take it only once and then raise up along the procedure.
  • Problem is in the code of your secondary index. Because you insert random values in non-ascending sorted manner. the best way is to generate secondary index for insert in memory's dictionary and inserted already sorted ASC, also probably using tran.Technical_SetTable_OverwriteIsNotAllowed("BizResultIndex");
    DBreeze likes sorted ASC values for insert to gain maximal speed.
Solution:

First part (without inserting of secondary index) works 3,5 sec.
The complete proc works around 12 seconds (without optimizing insert of secondary index, like I told before)
private void testF_003()
        {
            DBreeze.Diagnostic.SpeedStatistic.ToConsole = true;
            DBreeze.Diagnostic.SpeedStatistic.StartCounter("a");
           
            using (var tran = engine.GetTransaction())
            {
                tran.SynchronizeTables("BizResult", "BizResultIndex");
              //  tran.Technical_SetTable_OverwriteIsNotAllowed("BizResultIndex");

                var row = tran.Max<ulong, DbMJSON<BizObject>>("BizResult");

                ulong id = 0;

                if (row.Exists)
                    id = row.Value.Get.ID;

                for (int i = 0; i < 100000; i++)
                {
                    var r = new BizObject { CompanyID = (ulong)(i % 5), CategoryID = 2, CreatedDate = DateTime.Now.AddSeconds(i), Name = "First", Value = i };
                                      
                    id++;
                    r.ID = id;
                    tran.Insert<ulong, DbMJSON<BizObject>>("BizResult", r.ID, r);
                   //EOF FIRST PART
                    var key = r.CompanyID.To_8_bytes_array_BigEndian().Concat(r.CreatedDate.To_8_bytes_array());
                    tran.Insert<byte[], byte[]>("BizResultIndex", key, null);
                    //EOF SECOND PART
                }
                tran.Commit();
            }
            DBreeze.Diagnostic.SpeedStatistic.PrintOut("a", true);
        }
Jun 2, 2013 at 11:25 AM
With your changes, the first part runs for 9 seconds, and the complete proc runs for 30 seconds.

If it get the max id before the iterations, is it thread-safe?
Coordinator
Jun 2, 2013 at 12:45 PM
Edited Jun 2, 2013 at 12:45 PM
unruledboy wrote:
With your changes, the first part runs for 9 seconds, and the complete proc runs for 30 seconds.

If it get the max id before the iterations, is it thread-safe?
This all is offtopic but,
the speed depends upon hardware and drivers very much. Be sure to have the latest drivers for the motherboard. If OS runs under virtual machine
speed also depends upon VM tuning. Speed that I've published is the real speed on one of my PC's
(read in benchmark docu its characteristics).

Ok, probably you have to re-read docu again to refresh that after tran.SynchronizeTables(tables***) (or first modification command, in case of one modification table. inside the transaction scope) you can be sure that all tables*** are safe from writes from other threads till current transaction is finished.
Jun 2, 2013 at 1:42 PM
Edited Jun 2, 2013 at 1:52 PM
unruledboy wrote:
With version 50, I run the following code (RELEASE, not debug), and there is no other heavy disk processing in the system.

100K records, it runs for 87 seconds. Hardware is a cheap laptop with 5400RPM, 2 Cores, 4GB RAM.

Is it because the complex object model or lack of efficient for DbMJSON that causes the insert speed to be only around 1K/s?
I would suggest You start using protobuf-net. You have to configure certain things, but I am very satisfied with it.
You will also save a lot of space + probably gain on speed of serialization. But it needs to be configured well.
I myself am using annotations on model objects. But there's also a way to configurate it via config file. Also good
support for the inherited objects. Just be careful when assigning tag-ID's especially with ProtoInclude (for inheritance).
I also usually serialize all items (bulk) before insertion. On multicore machines You can use Parallel lib.
Also with protobuf-net be very careful with default values of primitive types etc. Best example would be setting
default states to for example -1 for signed types. Where You have to use annotation to re-specify default values.
Coordinator
Jun 9, 2013 at 12:56 AM
unruledboy wrote:
I hope you've noted that in the last example we can also optimize inserting of the secondary index "BizResultIndex": in the supplied code it's not sorted ASC what is good for the insert of random keys batches.
Jun 11, 2013 at 12:11 PM
But the BizResultIndex depends on the main table, we should insert in main table then in BizResultIndex, record by record, right?

Or you mean we can (but not biz correct?) sort the records first before insert, like this:
SortAllRecords();
for ()
{
    var mainKey = records[i];
    InsertInMainTable(mainKey);
    InsertInBizIndexTable(subKey, mainKey);
}
Coordinator
Jun 11, 2013 at 12:15 PM
unruledboy wrote:
But the BizResultIndex depends on the main table, we should insert in main table then in BizResultIndex, record by record, right?

Or you mean we can (but not biz correct?) sort the records first before insert, like this:
SortAllRecords();
for ()
{
    var mainKey = records[i];
    InsertInMainTable(mainKey);
    InsertInBizIndexTable(subKey, mainKey);
}
Exactly, you collect, first, in memory element for insert, then sort ASC and insert.