Space reclamation in rolled back transactions.

Mar 3, 2013 at 3:01 AM
Edited Mar 3, 2013 at 3:08 AM
Hi !

Blaze, as I already stated in the mail earlier. My problem is that Transactions (rolled back), keep using space in the database. Space is never reclaimed however, atleast
I haven't noticed any reclamation of space.

This is the state of files in the database after few (3-4) runs:

-rw-r--r-- 1 Krome Administrators 326465361 Mar 3 03:45 10000001
-rw-r--r-- 1 Krome Administrators 8 Mar 3 03:45 10000001.rhp
-rw-r--r-- 1 Krome Administrators 0 Mar 3 03:45 10000001.rol
-rw-r--r-- 1 Krome Administrators 159 Mar 3 03:45 _DBreezeSchema
-rw-r--r-- 1 Krome Administrators 8 Mar 3 03:45 _DBreezeSchema.rhp
-rw-r--r-- 1 Krome Administrators 0 Mar 3 03:45 _DBreezeSchema.rol
-rw-r--r-- 1 Krome Administrators 64 Mar 3 03:45 _DBreezeTranJrnl
-rw-r--r-- 1 Krome Administrators 8 Mar 3 03:45 _DBreezeTranJrnl.rhp
-rw-r--r-- 1 Krome Administrators 0 Mar 3 03:45 _DBreezeTranJrnl.rol

300mb of rolled back data. That is never reclaimed, always just adding up. So when
is this data reclaimed ? Is it a bug or is this by design ? Because I am quite worried
that unless my program below is in some sort of logical error, trash is going to
accumulate and never reclaimed I am going to run into problems in production use.

Also I am wondering, what happens if engine is not explicitly disposed (application crash/power failure) ?
Is there a recovery mechanism when engine is instantiated again/next time ? But my primary concern
is what I wrote above.

Following is the sample program that I've run:
static void Main (string[] args)
 {
   int i;
   int j;
   DBreezeEngine engine;
   
   DBreezeConfiguration conf = new DBreezeConfiguration ()
   {
    DBreezeDataFolderName = @"e:\tests\nosql\remi.dbr",
    Storage = DBreezeConfiguration.eStorage.DISK,
   };

   for (i = 0; i < 100; ++i)
    using (engine = new DBreezeEngine (conf))
    {
     Console.WriteLine ("Starting to write records.");
     using (var tran = engine.GetTransaction ())
     {
      try
      {
       tran.SynchronizeTables ("records");
       for (j = 0; j < 100000; ++j)
        tran.Insert<long, long> ("records", j, j);
       tran.Rollback ();
      }
      catch (Exception ex)
      {
       Console.WriteLine (ex.ToString ());
      }
     }
    }
 }
Mar 3, 2013 at 10:22 AM
Edited Mar 3, 2013 at 10:28 AM
  1. You don't need to use tran.SynchronizeTables if u change only one table per transaction.
  2. You don't need to use tran.Rollback (); in that case, disposing of transaction automatically call this procedure, so all not committed changes will be rolled back.
  3. And, finally, the answer :) Yes, deleted or rolled back elements will go on to stay in the physical data file. You can only repack the whole table into another one (one day), if u really need that. It's normal for high-speed storage architectures.
Mar 3, 2013 at 10:27 AM
About crashes - DBreeze is ACID, read docu, try emulate it yourself - you got sourcecode.

Some more words concerning rollbacks. If you really store the data - you have very small amount of rollbacks.
If you use DBreeze for intermediate computations (like #tmp tables in SQL), u have always ability, due to DBreeze architecture to store every table in separate file, to delete the complete file.
Mar 3, 2013 at 11:31 AM
hhblaze wrote:
About crashes - DBreeze is ACID, read docu, try emulate it yourself - you got sourcecode.

Some more words concerning rollbacks. If you really store the data - you have very small amount of rollbacks.
If you use DBreeze for intermediate computations (like #tmp tables in SQL), u have always ability, due to DBreeze architecture to store every table in separate file, to delete the complete file.
Yes, I've read the entire documentation today. I've figured out already what You just said. Thank You.
Mar 3, 2013 at 5:29 PM
Hi again,

After I've intentionally terminated program without cleanup a couple of times and started it again:
I started receiving this error:
System.IndexOutOfRangeException: Index was outside the bounds of the array. 
at DBreeze.LianaTrie.LTrie.Add(Byte[]& key, Byte[]& value, Boolean& WasUpdated) in G:\Data\Users\Krome\Desktop\dbreeze-ecfd494ce323\DBreeze\LianaTrie\LianaTrie.cs:line 416
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value, Byte[]& refToInsertedValue, Boolean& WasUpdated) in G:\Data\Users\Krome\Desktop\dbreeze-ecfd494ce323\DBreeze\Transactions\Transaction.cs:line 683
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value) in G:\Data\Users\Krome\Desktop\dbreeze-ecfd494ce323\DBreeze\Transactions\Transaction.cs:line 615
   at CommonVohrsoth.Storage.NoSQL.StoreRecords(Transaction transaction, List`1 records) in E:\Development\CSharp\Test\Test\NoSQL.cs:line 87.
My code is:
public void StoreRecords (Transaction transaction, List<Record> records)
  {
   ulong id;
   Row<ulong, DbMJSON<Record>> row;

   row = transaction.Max<ulong, DbMJSON<Record>> ("records");
   id = row.Exists ? row.Key : 0UL;
   foreach (Record record in records)
   {
    ++id;
    transaction.Insert<ulong, DbMJSON<Record>> ("records", id, record);
    transaction.Insert<byte[], ulong> ("guids", record.Guid.ToByteArray (), id); <<- This is the one that generates IndexOutOfRangeException.
   }
  }
Called from this method:
using (Transaction transaction = kernel.NoSqlDatabase.Transaction)
     try
     {
      transaction.SynchronizeTables ("logs", "records", "guids");
      kernel.NoSqlDatabase.StoreRecords (transaction, records);
      if (-1 != lastMark)
      {
       log.Processed = lastMark;
       kernel.NoSqlDatabase.UpdateLog (transaction, log);
      }
      transaction.Commit ();
     }
     catch (Exception ex)
     {
      // TODO: Handle.
      LOG.DebugFormat (ex.ToString ());
     }
Mar 3, 2013 at 10:25 PM
Would be nice to have more details (complete solution) and exact description of steps which must be performed to achieve such result.
Mar 4, 2013 at 9:03 AM
Edited Mar 4, 2013 at 9:23 AM
Ok I've extracted minimum and added some simulated delay.


Entry point:
namespace TestConsole
{
 class Program
 {
  private static readonly ILog LOG;

  static Program ()
  {
   LOG = LogManager.GetLogger (typeof (Program));
  }

  static void Main (string[] args)
  {
   DBreezeConfiguration conf;
   TestWriter tw;
   NoSQL nosql;

   conf = new DBreezeConfiguration ()
   {
    DBreezeDataFolderName = @"console.dbr",
    Storage = DBreezeConfiguration.eStorage.DISK,
   };

   XmlConfigurator.Configure ();
   nosql = new NoSQL (conf);
   nosql.Open ();
   tw = new TestWriter (nosql);
   tw.Start ();
   Console.ReadLine ();
   nosql.Close ();
 }
}
Nosql wrapper:
namespace TestConsole
{
 public class NoSQL
 {
  private static readonly ILog LOG;
  private DBreezeEngine engine;
  private DBreezeConfiguration configuration;
 
  public Transaction Transaction
  {
   get
   {
    Transaction rt;

    rt = engine.GetTransaction ();
    return rt;
   }
  }
 
  static NoSQL ()
  {
   LOG = LogManager.GetLogger (typeof (NoSQL));
  }

  public NoSQL (DBreezeConfiguration configuration)
  {
   this.configuration = configuration;
  }
  
  public void Open ()
  {
   if (null == engine)
    engine = new DBreezeEngine (configuration);
  }

  public void Close ()
  {
   if (null != engine)
    engine.Dispose ();
   engine = null;
  }

  public void StoreRecords (Transaction transaction, List<Record> records)
  {
   ulong id;
   Row<ulong, DbMJSON<Record>> row;

   row = transaction.Max<ulong, DbMJSON<Record>> ("records");
   id = row.Exists ? row.Key : 0UL;
   foreach (Record record in records)
   {
    ++id;
    transaction.Insert<ulong, DbMJSON<Record>> ("records", id, record);
    transaction.Insert<byte[], ulong> ("guids", record.Guid.ToByteArray (), id);
   }
  }
 }
}
Where everything happens.
namespace TestConsole
{
 public class TestWriter
 {
  private static readonly ILog LOG;
  private volatile bool running;
  private readonly object monitor;
  private readonly int sleepTime;
  private Thread thread;
  private NoSQL nosql;
  
  static TestWriter ()
  {
   LOG = LogManager.GetLogger (typeof (TestWriter));
  }

  public TestWriter (NoSQL nosql)
  {
   monitor = new object ();
   this.nosql = nosql;
   this.sleepTime = 1000;
  }
 
  public void Start ()
  {
   thread = new Thread (Processor);
   thread.IsBackground = true;
   thread.Name = "Log Processor";
   thread.Start ();
  }

  public void Stop ()
  {
   lock (monitor)
   {
    running = false;
    Monitor.Pulse (monitor);
   }
   thread.Join ();
  }

  public void Pulse ()
  {
   lock (monitor)
    Monitor.Pulse (monitor);
  }

  public void Processor ()
  {
   int i;
   Thread me;

   me = Thread.CurrentThread;
   if (me != thread)
    return;
   LOG.DebugFormat ("Log processor, state=starting.");
   for (running = true; running; )
   {
    for (i = 0; i < 10; ++i) // Simulated.
     ProcessLog ();
    lock (monitor)
     Monitor.Wait (monitor, sleepTime);
   }
   LOG.DebugFormat ("Log processor, state=stopping.");
  }

  public void ProcessLog ()
  {
   int i;
   int j;
   Record record;
   DateTime start;
   DateTime end;
   List<Record> records;


   records = new List<Record> ();
   for (i = 0; i < 1; ++i) // Simulated.
   {
    Thread.Sleep (250); // Simulated.
    records.Clear ();
    for (j = 0; j < 50000; ++j)
    {
     record = ProduceFakeRecord ();
     if (null != record)
      records.Add (record);
     record = null;
    }
    Thread.Sleep (20); // Simulated.
    start = DateTime.Now;
    foreach (Record storeableRecord in records)
     storeableRecord.Address.Address = 8888;
    LOG.DebugFormat ("Start storing.");
    using (Transaction transaction = nosql.Transaction)
     try
     {
      transaction.SynchronizeTables ("logs", "records", "guids");
      nosql.StoreRecords (transaction, records);
      transaction.Commit ();
     }
     catch (Exception ex)
     {
      // TODO: Handle.
      LOG.DebugFormat (ex.ToString ());
     }
    LOG.DebugFormat ("End storing.");
    end = DateTime.Now;
    LOG.DebugFormat ("Records stored, count={0}, Time={1:F1}ms", records.Count, (end - start).TotalMilliseconds);
   }
  }

  private Record ProduceFakeRecord ()
  {
   Record rt;
   Random random;
   int type;

   random = new Random ();
   type = random.Next () % 2;
   switch (type)
   {
    case 0:
     rt = new ThemeSelected ()
     {
      Type = Record.RecordType.THEME_SELECTED,
      Guid = Guid.NewGuid (),
      LogTimestamp = DateTime.Now.Ticks / 10000L,
      Timestamp = DateTime.Now.Ticks / 10000L,
      Theme = "Blah blah"
     };
     break;

    case 1:
     rt = new FullGame ()
     {
      Type = Record.RecordType.FULL_GAME,
      Guid = Guid.NewGuid (),
      LogTimestamp = DateTime.Now.Ticks / 10000L,
      Timestamp = DateTime.Now.Ticks / 10000L,
      SpinLines = 2,

     };
     break;

    default:
     rt = null;
     break;
   }
   return rt;
  }
 }
}
Now to simulate the failure, just run the program 5-10 times (could be even less). Press enter to break program when in the middle of storing records (When message: start storing. appears). I've used LOG4NET for logging purpose.

Code that actually does inserts to the database runs in background thread. After readline () in entry point, I call
nosql.close () which only does engine.dispose (). Without calling that I haven't seen the errors afterwards. My guess
is that while some transaction is still active and calling dispose on the engine there is where the problem happens.

Anyway after something unexpected happens. Next time I run the program, and when it starts to insert again the
following exception is thrown, as if something was corrupted:
04.03.2013 09:46:55,397 DEBUG TestWriter [Log Processor] - Start storing.
04.03.2013 09:46:55,451 DEBUG TestWriter [Log Processor] - System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at DBreeze.LianaTrie.LTrie.Add(Byte[]& key, Byte[]& value, Boolean& WasUpdated) in \DBreeze\LianaTrie\LianaTrie.cs:line 416
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value, Byte[]& refToInsertedValue, Boolean& WasUpdated) in \DBreeze\Transactions\Transaction.cs:line 683
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value) in \DBreeze\Transactions\Transaction.cs:line 615
   at CommonVohrsoth.Storage.NoSQL.StoreRecords(Transaction transaction, List`1 records) in \TestConsole\NoSQL.cs:line 87
   at TestConsole.TestWriter.ProcessLog() in \TestConsole\TestWriter.cs:line 140
Two, additional questions:

1.) I store guids, of the records in a separate table as You can see. Is it possible to store guids in the same table, but to point to same data, (as secondary key) ?

2.) In case that I use some utility to read database file for example LESS: less -I 10000001 and I run program, following
error will occur:

04.03.2013 10:17:30,530 DEBUG TestWriter [Log Processor] - DBreeze.Exceptions.DBreezeException: Getting table "@utrecords" from the schema failed! ---> System.IO.IOException: The process cannot access the file 'e:\Development\CSharp\Vohrsoth\TestConsole\bin\Debug\console.dbr\10000001' because it is being used by another process.

Perhaps it should block if it cannot open it exclusively for read/write until it can ?
Mar 4, 2013 at 11:58 AM
Very nice, issue is detected, very soon will be new release, thank you!
Mar 4, 2013 at 12:27 PM
hhblaze wrote:
Very nice, issue is detected, very soon will be new release, thank you!
No problem, thing is I have an important project I am working on, where durability of data must be ensured. Would You please answer rest of my questions
that I've asked ? Also what's best way to store only key (just like hashset). Where value is not important.
Mar 4, 2013 at 12:28 PM
Edited Mar 4, 2013 at 1:15 PM
Sure, only one process can access DBreeze files, it's an embedded database.
Engine starts to work in the beginning of the application and disposes in the end. The whole application life cycle, Transaction 3 files and Scheme 3 files are locked by the engine instance. The rest of files act different, if, right now, there is a read/write activity with the table, then its physical file is locked, when all threads finish to read/write the table, table file handler is being closed and you can copy/delete/move it.

Concerning secondary indices - there are many chapters in the docu.
But briefly,
When you insert or update a row in the table you can receive physical pointer to the Key/Value.
Later, using SelectDirect and supplying pointer as parameter, you can retrieve those key/values.

There is also interesting stuff like NestedTables.

We use all that in following manner:

For example, we have table with messages. Every message has its own id. We also want to traverse those messages by creation dateTime.
We decide to use one physical table for storing primary and secondary indices.

We create table Message: Key is byte[], Values will be different

Key Value
0 NestedTable (primary table for the message, which will contain id of the message and the message)
        Key                         Value
        long(MessageId)      JSON Message
1 long (identity of the message)
2 Nested Table (secondary index creation date)
      Key                                                                                           Value 
      compound: DateTime (creation of the message)+MessageId           Pointer to primary table
  class Message
        {
            public string Text { get; set; }
            public long Id { get; set; }
            public DateTime CreationDt { get; set; }
        }

        private void TestMessage()
        {
            DBreezeEngine eng=new DBreezeEngine("YOUR PATH");

            using (var tran = eng.GetTransaction())
            {
                string table = "Messages";

                tran.SynchronizeTables(table);

                var tblMessage = tran.InsertTable<byte[]>(table, new byte[] { 0 }, 0);  //InserTable is used, because we want to modify it

                long msgIdentity = 0;
                var rowIdentity = tran.Select<byte[], long>(table, new byte[] { 1 });
                msgIdentity = rowIdentity.Exists ? rowIdentity.Value + 1 : 1;

                var tblSecIndex = tran.InsertTable<byte[]>(table, new byte[] { 2 }, 0);

                Message m=new Message()
                {
                     Id = msgIdentity,
                      CreationDt = DateTime.UtcNow,
                       Text = "jo!"
                };

                
                byte[] ptr=null;
                //inserting message
                tblMessage.Insert<long, DBreeze.DataTypes.DbMJSON<Message>>(m.Id, m,out ptr);
                //inserting new identity
                tran.Insert<byte[], long>(table, new byte[] { 1 }, m.Id);
                //inserting secondary using DBreeze.Utils
                byte[] secIndkey = m.CreationDt.To_8_bytes_array().Concat(m.Id.To_8_bytes_array_BigEndian());

                tblSecIndex.Insert<byte[], byte[]>(secIndkey, ptr);

                //Commiting all together
                tran.Commit();
            }

            //finding message by dateTime
            using (var tran = eng.GetTransaction())
            {
                string table = "Messages";

                var tblMessage = tran.SelectTable<byte[]>(table, new byte[] { 0 }, 0);      //using SelectTable

                var tblSecIndex = tran.SelectTable<byte[]>(table, new byte[] { 2 }, 0);     //using SelectTable

                foreach (var row in tblSecIndex.SelectForward<byte[], byte[]>())    //for easiness getting all
                {
                    //Getting Message
                    var m = tblMessage.SelectDirect<long, DBreeze.DataTypes.DbMJSON<Message>>(row.Value);
                    Console.WriteLine("MessageText: {0}; ", m.Value.Get.Text);
                }

              
            }

        }
Mar 4, 2013 at 12:35 PM
the minimal amount of space, for the value, resides "byte" datatype. Use it for the hashset.
Mar 4, 2013 at 1:27 PM
Please, test the new release.
Mar 4, 2013 at 2:57 PM
Edited Mar 4, 2013 at 2:57 PM
Tested.
Well the exception we spoke earlier that occurs after I restart the program does not appear again. Will however test further later on.

However, I still get exception when application is closing (dispose is called on the engine before transaction seems to
be complete):
04.03.2013 15:44:55,455 DEBUG TestWriter [Log Processor] - DBreeze.Exceptions.TableNotOperableException: records
   at DBreeze.LianaTrie.LTrie.CheckTableIsOperable()
   at DBreeze.LianaTrie.LTrie.Add(Byte[]& key, Byte[]& value, Boolean& WasUpdated)
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value, Byte[]& refToInsertedValue, Boolean& WasUpdated)
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value)
   at CommonVohrsoth.Storage.NoSQL.StoreRecords(Transaction transaction, List`1 records) in \TestConsole\NoSQL.cs:line 88
   at TestConsole.TestWriter.ProcessLog()
OR
04.03.2013 15:44:36,079 DEBUG TestWriter [Log Processor] - DBreeze.Exceptions.TableNotOperableException: guids
   at DBreeze.LianaTrie.LTrie.CheckTableIsOperable()
   at DBreeze.LianaTrie.LTrie.Add(Byte[]& key, Byte[]& value, Boolean& WasUpdated)
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value, Byte[]& refToInsertedValue, Boolean& WasUpdated)
   at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value)
   at CommonVohrsoth.Storage.NoSQL.StoreRecords(Transaction transaction, List`1 records)
   at TestConsole.TestWriter.ProcessLog()
As You could see earlier in my program, I do inserts into two tables. So sometimes exception is thrown
for the first insert and sometimes for the second. There is no exception if engine is not disposed, since
thread I am running is a background one and is just shutdown.
Mar 4, 2013 at 3:13 PM
Right, it appears in case if Engine is disposed, but parallel running transaction still not.
I think it's normal, your program must swallow this exception?
Mar 4, 2013 at 3:17 PM
Edited Mar 4, 2013 at 3:21 PM
Ok, what do you mean by parallel running tranaction still not ?
Mar 4, 2013 at 3:23 PM
I mean that thread where you dispose engine, can differ from the threads where transactions live.
So, read. Engine is disposed, but transactions, running in parallel threads are still alive for a while.
Mar 4, 2013 at 3:28 PM
I made another small fix, please, take the newest version for your tests.
Mar 4, 2013 at 3:31 PM
Ok will do.

Ofcourse in my application main thread is which waited for Console.Readline (). After readline i called dispose on engine.
But TestWriter is running own thread which inserts those records.
Mar 4, 2013 at 3:37 PM
It's clear, that's why you got those mistakes in TestWriter, which must be try...catch swallowed.
Mar 7, 2013 at 8:31 AM
Edited Mar 7, 2013 at 8:54 AM
About Your sample:
byte[] secIndkey = m.CreationDt.To_8_bytes_array().Concat(m.Id.To_8_bytes_array_BigEndian());
1) Why do you convert endianess of the id here ?

Also a few other questions:

2) When writing non-compound key is it more efficient to convert it to big endian ?

3) Is there a way to store partial byte[] array as a value, let's say I have a fixed byte[] of size 1024 bytes and I only want to store 64 bytes of that byte array and maybe even with index offset. It would really be handy if for example I use my own serialization that outputs to MemoryStream. That way I could use MemoryStream.GetBuffer () instead of copying from memorystream to new byte[] array.

4) Can direct pointer from insert also be stored in "physically" different table, to be used as SelectDirect ?

5) What happens if in transaction: 1. You commit and then after that 2. You rollback ?

6) If You update a value of some key, and the value uses same storage size, will it just replace the content or add another record and take more space on the storage ?

7) Storage compact ability would really be a neat feature. Although You explained about it earlier but still would be a useful feature, where speed is desired but space is also a problem.

8) Is it more efficient to store additional indexes in a separate table or same table or using nested table (for records and all additional indexes) ?

9) If I insert bulk data into database, with presorted index (which are dates). Next bulk insert is also date presorted but is not incremental of previous one, for example dates begin before dates that I have already inserted. Is that a problem ?

To illustrate my question:

first insert (keys): 10, 11, 12, 13, 14
second insert (keys): 5, 6, 7, 8, 9
third insert (key): 1,2,3,4.

As another more realistic example when I parse these 3 log files:
2012-11-18.log
2012-11-19.log
2012-11-20.log

Naturally each log will contain lines/records already sorted by datetime since they are consecutive.
I parse logs partially always, and insert into database. But logs themselves are not parsed in order.
The only order is from records I get from them. So I might parse 2k records from 2012-11-19 then
5k from 2012-11-18 then 1k from 2012-11-20 and again 6k from 2012-11-19 etc etc. I hope I made
my example clear. :)

10) If record has secondary index (which uses direct pointers) and that key is removed, will the original
record/index also be removed ?

11) I will have some more questions, I hope it's not a problem to answer them ? I suppose they are/will be useful to other newcomers of this project.



Thanks.
Mar 7, 2013 at 9:14 AM
Edited Mar 7, 2013 at 9:46 AM
11) You are a very concerned person :)

1,2)
Actually, it doesn't matter how you store your keys in Big-Endian or Low-Endian format, the most important that later you compare these keys in the same format.
LianaTrie (base of DBreeze) is a variation of a radix trie, now you can imagine how keys are located.
DBreeze.Utils.ByteArray conversion functions create from any digital type a SORTABLE byte array and, sure, of specific length. This all give us ability to apply comparing operations to the keys and make Forward, Backward FromTo StartsWith etc. operations.

Read this branch http://dbreeze.codeplex.com/discussions/431203

Make tests to feel DB.

3)
You can partially update value from any index using tran.InsertPart or you can first read full value into memory (as byte[]) then update it starting from any index (use rich set of extensions in DBreeze.Utils.ByteArray) and then fully write it back. For now there is no other ways and no plans to enhance this behavior.

But there is another one of my favorite techniques tran.InsertDataBlock and tran.SelectDataBlock.

When you insert a key/value pair you can receive pointer to this KVP to use it later with table.SelectDirect.
You can store this pointer in the secondary index table.

When you insert value of size 100 bytes, then you update the same value with 80 bytes, then you update the same value with 100 bytes, physically - no extra space will be consumed, DBreeze, tries to use already allocated space and you can update as many times your value (<=100 bytes)- it will always use the same physical space in the file. But if you next time update with 120 bytes - absolutely new 120 bytes will be allocated and now you can update it (<=120 bytes) - it will use the same 120 bytes space. Previous 100 bytes will be lost and can be only deleted after repacking.

So, I mean that if your updated value will reside more physical space then previously inserted value, your pointer to KVP also will be other and must be updated in SecondaryIndex table.

But, if your SecondaryIndex row contains reference to the primaryIndex row, who has reference to the DataBlock inside (reference to the DataBlcok is fixed by 16 bytes), even if you update DataBlock and update its reference in primary table, you don't need to change secondary index references.
DataBlock update, from the space point of view, acts the same as standard value update: enough place - will be used, not enough - new space will be allocated.
Mar 7, 2013 at 9:43 AM
4)
Pointer is only a byte[8] who refers to the file location where KeyValue starts, you can store it wherever you want, but then you must supply it into to SelectDirect - that's all. These pointers are solutions to make JOINS faster. You don't store in SecondaryIndex table a primaryTable key, but only pointer to that key, sure you must manualy update it when primary table key or value are changed.

But when you create your DAL, you write it like this:

void InsertAnObject():
  • Inset into PrimaryTable, get pointer
  • Insert into SecondaryTable that pointer
  • Insert into SecondaryTable that pointer
    ...
When you update a value, you can re-run this procedure and you will receive automatic update in all necessary tables.
Remove object procedure, you have to write extra :)

5)
All changes in all tables (inside the transaction) will be committed, after you say tran.Commit(). And if you don't make any operation between tran.Commit() and tran.Rollback, then it's gonna nothing to Rollback.

7)
Storage strategies are very different. Programmers start from simplest tables, then they use fractal structures, of the 5th level with different values under different keys having DataBlocks inside.
Let them compact it themselves, if they think it's necessary.

8)
Hard to say, I like DBreeze ability to take an entity projection from a production server (by copying only one DBreeze file) and copy it to the programmer's PC. I think that common data must be stored together, for example object and its secondary index tables can reside physically in one file.
But all depends...
About separation of data is easier to answer.

9) Should be no problems, may be will be a little bit slower, then it could be and will take a bit more space then it could - all not significantly.

10) DBreeze knows nothing about your secondary index tables. You make DAL, you create tables, you remove and support them.
Mar 7, 2013 at 10:00 AM
Thanks a lot for Your input blaze.

Yes I am worried type of guy, the software I am writing is going to run on 3k+ machines and data is about money. So I do have to be careful :)
Mar 7, 2013 at 10:48 AM
3K+ machines is not a sneezed ram, so good luck!
Mar 7, 2013 at 2:20 PM
What You mean sneezed ram ? :) Oh no each machine will run a database that storex approx 4-5 mil records. Later its all collected and aggregated.
Mar 7, 2013 at 2:48 PM
"sneezed ram" was just a joke, pointing that your task is not the easy one and I wished you a good luck ;)
From the other side 3K servers very close to the farming business.
In any case, would be nice to know your DBreeze integration experience, so, please, be in touch.
Mar 7, 2013 at 2:59 PM
Thanks :) Ofcourse I will. Interesting project nonetheless.