Removal of keys.

Aug 8, 2013 at 6:41 PM
Edited Aug 8, 2013 at 7:14 PM
Hi blaze, I'd like to get back to our discussion:
https://dbreeze.codeplex.com/discussions/439648

I already posted example of code in that discussion thread (just before Your last answer),
method PurgeRecordsByGuid.

Now I've been doing some tests. As You can see in the example I got 2 nested tables in one
physical table. Table Guid and Data.

Now Data table is in a form of <ulong, byte[]>.
Guid table is in a form of <byte[], ulong> where byte[] is actually guid converted to byte[].

I am using the sequential guids (the one You suggested from one site):
http://csharptest.net/1250/why-guid-primary-keys-are-a-databases-worst-nightmare/

So what I do is:
I receive List of guids which I use to query Guid table and get ulong value.
This ulong value is the key for Data table.
Now I remove guid key from Guid table,
then I remove key from Data table.

So now all is fine and well. But I've been doing some timing tests and if You look
at the method the part:
for (i = 0; i < sortedGuids.Count; ++i)
   {
    rawGuid = sortedGuids[i];
    row = dt.Guid.Select<byte[], ulong> (rawGuid, true);
    if (row.Exists)
    {
     id = row.Value;
     keys.Add (id);
     dt.Guid.RemoveKey<byte[]> (rawGuid);
    }
   }
Most time here is spent on the select itself actually. You have any suggestion for this maybe ?

I've been experimenting and replaced above code with the following code:
 sw.Restart ();
    if (sortedGuids.Count > 0)
    {
     startGuid = sortedGuids[0];
     stopGuid = sortedGuids[sortedGuids.Count - 1];
     foreach (Row<byte[], ulong> row2 in dt.Guid.SelectForwardFromTo<byte[], ulong> (startGuid, true, stopGuid, true))
      if (row2.Exists)
      {
       rawGuid = row2.Key;
       if (dictionary.ContainsKey (rawGuid.ToBytesString ()))
       {
        id = row2.Value;
        keys.Add (id);
        dt.Guid.RemoveKey<byte[]> (rawGuid);
       }
       else
       {
        ++falseRecords;
       }
      }
    }
    sw.Stop ();
This can work much better, but sometimes it happens that some entries are not the ones
I wish to delete that's why You can see I check dictionary to see if I should or not delete them.
You can see that falseRecords variable counts those records.
Sometimes it's 0 sometimes it can as much as 1mil or 5mil or even more of the false ones.
I would really like to improve this so please give me some suggestions.
Coordinator
Aug 8, 2013 at 7:13 PM
Edited Aug 8, 2013 at 7:29 PM
Before implementing special RemoveKey overload with a possibility to return a deleted value, I GOT SMTH. for you:
private void testF_007()
        {
            using (var tran = engine.GetTransaction())
            {
                tran.Insert<byte[],int>("A",new byte[] {1,2,3},1);
                tran.Insert<byte[],int>("A",new byte[] {1,2,4},1);

                tran.Commit();
            }

            using (var tran = engine.GetTransaction())
            {
                bool wasRemoved = true;

                foreach (var row in tran.SelectForward<byte[], int>("A"))
                {
                    tran.RemoveKey<byte[]>("A", row.Key, out wasRemoved);
                    if (wasRemoved)
                    {
                        var xrow = tran.Select<byte[], int>("A", row.Key, true);
                        Console.WriteLine("You have deleted key {0} value {1}", xrow.Key.ToBytesString(), xrow.Value);
                    }

                    tran.Commit();
                }
            }

            using (var tran = engine.GetTransaction())
            {
                Console.WriteLine(tran.Count("A"));
            }
        }
Aug 8, 2013 at 8:19 PM
Edited Aug 8, 2013 at 8:28 PM
Also how much difference is dbreeze version from 53 to 55 ? In production i am currently running 53. Is latest stable enough ?
Anything incompatible ?
Aug 8, 2013 at 8:36 PM
Yes this would work assuming I want to remove all keys. But this would also remove keys I don't want to be removed.
Coordinator
Aug 9, 2013 at 3:49 AM
Edited Aug 9, 2013 at 4:03 AM
krome wrote:
Also how much difference is dbreeze version from 53 to 55 ? In production i am currently running 53. Is latest stable enough ?
Anything incompatible ?
You can upgrade, there is no incompatibilities, just enhancements. Take 56 from source, which I have prepared to test TranJrnl growth control. The same DLL will be in release 56.
Coordinator
Aug 9, 2013 at 4:00 AM
krome wrote:
Yes this would work assuming I want to remove all keys. But this would also remove keys I don't want to be removed.
As I understand:
  • you have Array of sortedGuids in memory
  • you have a Dictionary of Guids which you don't want to delete (from this sortedGuids array ) also in memory
  • DBreeze has inside all sortedGuids + more Guids
  • you want to remove from DBreeze only sortedGuids minus Guids from Dictionary
    ?
Aug 10, 2013 at 1:45 AM
I want to remove only guids that are on my list. Since i used SelectForwardFromTo it will also give me some guids that are not on the list.
Problem is SelectForward* can return a lot of guids in between that I am not supposed to delete. Two things I wanted to ask You:
1) Could You make removeKey overload that returns value of the removed entry
2) How about some sort of bulk select ? like list of keys to select. I bet a lot of time is spent
calling select on each key, maybe that can be lowered ? Perhaps to be able to pass List/IEnumerable/array of keys to select ?
Coordinator
Aug 10, 2013 at 8:34 AM
Edited Aug 10, 2013 at 10:25 AM
In your first example
for (i = 0; i < sortedGuids.Count; ++i)
   {
    rawGuid = sortedGuids[i];
   ..
   row = dt.Guid.Select<byte[], ulong> (rawGuid, true);
you were iterating all keys in the list, also keys which wouldn't have to be iterated. Not good for HDD.

In your second example
foreach (Row<byte[], ulong> row2 in dt.Guid.SelectForwardFromTo<byte[], ulong> (startGuid, true, stopGuid, true))
...
if (dictionary.ContainsKey (rawGuid.ToBytesString ()))
you also iterate all keys, losing time for the sectors (between startGuid and stop) which don't need to be iterated. Not good for HDD.

My suggestion is to use smth. more like first variant. Also, answering your 2).... this standalone selects work quite fast if you request them in sorted manner. They work as fast as would selectForward do:
let sortedGuids is array of all keys in memory
Dictionary<string,byte[]> dt= ... is dictionary of keys which should not be deleted

foreach(var key in sortedGuids.OrderByAsc())  //in memory we iterate full list
{
   if(dt.contains(key))
      continue;    //but we skip keys which should not be deleted and make life of DBreeze and HDD easier
 
   //this key we must remove from Dbreeze, preliminarily taking its content
   tran.RemoveKey<byte[]>("A",key, out wasRemoved);  

                    if (wasRemoved)
                    {
                       //key exists in DBreeze, so we want to get its value and store somewhere else
                        var xrow = tran.Select<byte[], int>("A", key, true);
                        //Note, upper Select first time creates an instance of Liana Trie and next time uses this instance, which will not be closed untill the end of transaction.  
                       // More of it, every select which you call will not jump again from the root, but will go on from the place where it was stopped.
                        //this feature is very important for the traversing speed of the sorted data, and you can use it.

                        //Of course Remove overload, with returning deleted value, would be even easier and a bit faster. About its implementation I will think in the neareast future...time....time...time...
                     }

      tran.Commit();
}
Coordinator
Aug 11, 2013 at 10:55 AM
So, when the weather is bad there is always more time for doing unnecessary things....that's why you can take newest DBreeze with RemoveKey overload (read docu).
Aug 11, 2013 at 11:27 AM
:)