🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Matching vectors, best approach?

Started by
6 comments, last by Shaarigan 3 years, 1 month ago

Hi all,

I have a fairly straight forward problem, but so far I've only come up with solutions that are pretty cumbersome (lots of it statements and redundant code), so I was wondering if someone has a good approach to do this.

  • I have std::vectors with unsigned ints
  • depending on the use-case, 1, 2, 3 or 4 of the vectors are needed (form of filtering)
  • I need to figure out which numbers occur in all of the 1, 2, 3 or 4 vectors, depending on the filter selection in my tool

There must be a better way then doing a lot of if statements to check which vectors to use and then loop through them. For ease of use I could also have the 4 vectors be an array of vectors (e.g. std::vector<uint> myNumberVectors[4]

Any input is appreciated.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Advertisement

Sounds like what you want is set intersection. If your containers are ordered then std::set_intersection can give you the basic building block to do 2, although you could extend it more optimally to 3 or more sets if performance is important. And there is plenty of information online about implementing set operations. With ordered sets you can do the intersection and several other operations with N sets in a single pass.

If your containers are unordered, then there isn't really a good general solution without multiple passes. You could optimise it a bit (e.g. sorting the sets themselves so the shortest is first), but probably either an ordered container or a hash set would be the way to go depending on the values.

Otherwise the thing to look at is just putting your sets themselves in another container. A nested vector here might copy data you don't need to, but you could store a pointer to each vector or similar to avoid copies.

// the unoptimal approach
bool contains(const std::vector<int> &container, int val)
{
    for (auto &i : container)
        if (i == val)
            return true;
    return false;
}
std::vector<int> unordered_set_intersection(const std::vector<std::vector<int>> &sets)
{
    std::vector<int> out;
    for (auto x : sets.front())
    {
        bool all = true;
        for (auto it = sets.begin() + 1; it != sets.end(); ++it) // except first set
        {
            if (!contains(*it, x))
            {
                all = false;
                break;
            }
        }
        if (all) out.push_back(x);
    }
    return out;
}

If the vectors are not sorted but the elements are all in the range 0..63, you can generate a bitmask of which elements occur in which vector fairly quickly. Perform bitwise ‘and’ on the bitmasks for each vector to get a single bitmask, then iterate over just one of the vectors but skip the elements where the corresponding bit in the bitmask is not set.

If the vectors are sorted, there's a fairly obvious way to do this by stepping through all vectors at once. Point iterators to the beginning of all vectors. So long as not all iterators point to the same element, find which iterator(s) point to the largest element, then increment all others. Stop when one vector is exhausted or all iterators point to the same element. If all iterators point to the same element, process that element, then increment all iterators.

If the vectors are not sorted and the values are too big to make a bitmask practical, your best bet is to sort the vectors first.

Thanks both. Some thoughts:

  • the sets are ordered in a way that the numbers increase, but not all id’s are in each vector
  • the numbers will go higher then 64
  • performance is not important here

I might have found a solution while processing the above. What I could do is create 1 new vector of ints, where the index matches the IDs. Then loop through 1, 2, 3 or 4 of the results vectors, when it’s a ‘1’, increase the number in the new vector, basically a refcount. And in the end the indices/IDs with the same refcount as amount of used filters, form the end result.

Does that make sense?

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

If it is ordered common algorithms will work to do it in a single pass, and will work for numbers, strings, whatever datatype, . So I think you are overthinking it.

std::set_intersection does it for 2, and intersection(a, intersection(b, intersection(c, d)) etc. is mathematically the same as some intersection(a, b, c, d), so if performance is not a great concern just use that. Or implement the general case using multiple iterators.

A bitmask or another vector could be an optimisation in the unordered case I showed.

Thanks, I'll dive into std::set intersection first then ? and then make the decision

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

I had a similar problem in C# and the solution was to use a dictionary of type

Dictionary<Key, int>

and iterating over the first list and storing all it's keys into it. Then iterate over the second list and see if the key is present. To get a list of intersection elements, just iterate over the dictionary and find those elements which have a number value of equal to the number of lists you want to compare.

In C++ this can be solved the same, using some kind of hash container data structure like map/unordered_map and to minimize memory allocations, initialize it with a capacity that matches the vector with the smallest amount of elements

Thanks, after comparing the different solutions I ended up with the refcount solution, which eventually was very straight forward.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

This topic is closed to new replies.

Advertisement