-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non deterministic filter results #64
Comments
Hello, Funny enough, this problem doesn't originate from the filtering logic in column/examples/million/main.go Line 99 in a4a4570
However, for loops ranging over maps in Go are indeterministic when it comes to order. Within |
Thanks for responding so fast @Dreeseaw, it's much appreciated! If I may pick your brains about two more things; Is it bad practice to chain multiple func checkFilters(txn *column.Txn, filters []Filter) *column.Txn {
for _, filter := range filters {
txn = txn.WithValue(filter.Field, func(cellValue interface{}) bool {
return compare.IsTruthy(cellValue, filter.Comparitor, filter.Value)
})
}
return txn
} Secondly, am I right in thinking this library doesn't support ordering (either at Query time or by defining an index before a query runs)? If not, would this be a possible thing to achieve in a library like this or is it a limitation of the bitmap indexing or columnar architecture? I've looked at a few other libraries and the closest to ordering in an in-memory columnar database is arcticdb which orders records during insertion only, and once it's inserted doesn't allow any other ordering indexes to be created, or order to be defined in a query as far as I can tell. I really appreciate your time on this, I'm new to both GoLang and columnar databases! |
@james-bowers Your approach of loop filtering looks good, and no, you probably shouldn't create a new index on each value in a column. The exception to this is a column that you'd be fine trading a memory hit for a performance gain, such as one with a known distinct number of values (like "class type" in the rouge example) that is used in a lot of your transactions. To your second point, no, this library does not support any sort of "ORDER BY" statements (speaking in SQL). I'd recommend ranging through after gathering your filtered results, similar to what you see here. @kelindar I would be interested in what you think about adding sort functionality, as wouldn't it require holding a lock across the entire column that's being sorted on? Or using an MVCC linked-list so that the values remain intact throughout the range function? |
@Dreeseaw indeed, sorting would be good to add as it's a fairy common operation. I would want to avoid going the MVCC route if possible, so what we could do instead is implement a special sorted index that allows users to define a sorting column and whenever a new value is inserted/deleted, we auto update the index. |
@kelindar Hello, I've been thinking about this sorted index & wanted to try a base implementation. By creating a new Let me know what you think of this approach! |
@Dreeseaw I think it could be a good start. Not sure about google/btree specifically, would need to compare (I like https://github.com/tidwall/btree as an implementation, zero dependencies). |
Hey, it's me again
I've stumbled across a case where my filtering logic sometimes returns the correct result; a count of 5, and sometimes returns a count of 0.
I've created a gist to show the logic and the test I'm running
https://gist.github.com/james-bowers/55722b1cf2cb60d88093a3051c7e0c20
I'm wondering if there's something I'm doing wrong in the
checkFilters
function 🤔 as in the readme examples, the "where" clause is achieved by creating various indexes. However I'm confused as to why my approach sometimes works, and sometimes doesn't.Any light you can shed on the issue would be greatly appreciated.
If the recommended approach is to always create an index for each filter, what overhead is there in doing this?
Thanks so much! 🙏🏼
The text was updated successfully, but these errors were encountered: