Removing values from DBreeze

Oct 14, 2013 at 6:37 AM
Edited Oct 14, 2013 at 6:38 AM
Hi, how can i remove value from DBreeze. There is method RemoveKey. but it's doesn't removes value. I used Dbreeze as file storage. In documentation there is little bit about -"the old data will be not visible any more, but the old information will still reside in the table."
Does values deletes later automatically?

This code i used for file delete:

using (var tx = FileStorageEngine.GetTransaction())
                    var tableName = string.Format("{0}_Files", userName);
                    foreach (var fileStorageId in fileStorageIds)
                        tx.RemoveKey(tableName, fileStorageId);
                catch (Exception)
Oct 14, 2013 at 8:52 AM
Data physically will stay in the file, but logically will be removed. Nothing will be later automatically removed.
You must understand what can mean physical deletion of the part of file, it can mean complete file rewriting into new one with links changing. It will take as much time as big file is, so the time of the execution of such procedure will proportionally depend upon the file length.
In database techniques we rare physically delete. If we need it, we plan the data storage structure in such way that it can be deleted easily in the future.
Oct 14, 2013 at 10:58 AM
Why don't you store files in filesystem and use DBreeze only for storing file's metainfo (link to the file in FileSystem, file name, user to whom file belongs etc.)?
Jan 30, 2014 at 11:10 PM
I understand why "value" data is not removed when the key is removed - for performance reasons. However, over weeks, months, etc. - a large amount of "dead" data values could end up using lots of space. It would be very nice to have a "purge" or "rebuild" method that can be called to remove dead data. I understand the purge could take a long time to run, but it would be up to us to decide when to call the purge method.
Jan 31, 2014 at 8:31 AM
Edited Feb 4, 2014 at 11:29 AM
Actually, if to think globally, may be not so much space will be wasted.... it always depends upon, e.g. I don't have such necessity in my business and I have a big data flow, we just rarely delete and update. When the wasted data quantity can become a problem, probably, new types of hard drives will be launched on the market with multi-terabytes/petabytes capacity.

But, nevertheless, once per year it's possible to create a transaction, close table for the write inside of it, then copy all data from one table into another. Newly created table will contain only active datasets and will be smaller in size in compare with the old table. Then old table can be deleted and new table can be renamed with the "old table name". This "deleteing old table and renaming new table" procedure can cause mistakes in parallel threads, who wants to read data from the "old", deleted table.... it can happen during 1 second and only in reading threads, of course it will lead to exception. Every program must handle exceptions - it's normal. There will be no data loss or saving logic breakage.

There is no automatic copy procedure from table to table, because data stored in DBreeze tables (including nested tables) is very heterogeneous and programmer, who knows table structure, currently must make this copy proc manually. But there is a theoretical possibility to make such procedure automatic, because nested table always starts from a block of 64 bytes and there is always "" stamp inside and any datatype, stored in DBreeze, is only byte[] in low level. Problem that there are hardly identifiable pointers to datablocks.
Feb 4, 2014 at 11:30 AM
Edited Feb 4, 2014 at 11:31 AM
If data in the table is updated/removed many times and there is a point in time where table must be empty, then
tran.RemoveAllKeys("table", true); - removing all keys command with the table recreation can be used.