Section 1. This report looks at finding a collection of equal-size subsets of a master set, each subset being "sufficiently distant" from all the others. Exactly what this means will be made clear as the explanation progresses.
We are dealing with small finite sets, and naïve set theory suffices.
Elements of the sets are represented by majuscules from the earlier part of the alphabet, such as A, B, and C. These elements cannot be assumed to have any properties, except that they exist and that distinct letters stand for assuredly unequal elements.
The names of subsets of the master set are majuscules from the latter part of the alphabet, such as W, X, and Y. An exception is Z, sometimes subscripted with its size, which denotes the master set itself. For example, U might equal {C, F, H}, which will usually be written as the more compact CFH when no confusion will result. A collection of subsets of Z_{9} = {A, B, C, D, E, F, G, H, I} = ABCDEFGHI might be written equivalently in any of the three following formats:
{{B, C, G}, {C, F, H}, {D, E, H}}
{BCG, CFH, DEH}
BCG CFH DEH
For standardization, everything is written in alphabetical order unless there is a reason not to.
The usual term for the quantity of elements in a set is cardinality, although the shorter term size is adequate here. If two sets are the same size, we define the distance between them to be the size of either set minus the size of their intersection; or equivalently, half the size of the symmetric difference between them. Examples:
If two sets are of different sizes, we do not define a distance.
A briefer way of saying "less distant from" is "closer to".
Section 2. Now it can be more precisely asked: Given a master set, what is the largest collection of its subsets, all of a specified size, such that the distance between any pair of subsets satisfies a specified minimum?
A concise notation is DSC [m, s, d], where:
In explaining the subject, it helps to start with a concrete example, namely DSC [7, 3, 2]: Given the master set Z_{7} = ABCDEFG, what is the largest collection of subsets of size 3 such that the distance between any pair of them is at least 2?
For reference, here is a table of all the 35 size-3 subsets of Z:
ABC | ||||||||
ABD ACD | BCD | |||||||
ABE ACE ADE | BCE BDE | CDE | ||||||
ABF ACF ADF AEF | BCF BDF BEF | CDF CEF | DEF | |||||
ABG ACG ADG AEG AFG | BCG BDG BEG BFG | CDG CEG CFG | DEG DFG | EFG |
The 30 solutions of DSC [7, 3, 2] are listed later, but here is one of them:
ABC ADE AFG BDF BEG CDG CEF
Note for instance that ABC and ABD cannot co-exist because the distance between them is only 1.
In all nontrivial cases, the DSC problems have multiple solutions, often related by permutation. For example, exchanging A and B and re-alphabetizing gives this:
ABC ADF AEG BDE BFG CDG CEF
Perhaps surprisingly, if E and F had been exchanged instead of A and B, the result would have turned out the same. This behavior helps to illustrate why there are 30 solutions, and not 7! = 5040.
Here is a recommendation for DSC [m, s, d]:
Section 3. There is a greedy algorithm that produces a collection, but not necessarily of maximum size. Here it is, applied to DSC [7, 3, 2]:
• Step 0 in the table below lists the 35 subsets, now written vertically to make the table more graphically manageable. Each subset is preceded by a question mark, because it is not yet determined which subsets will end up in the final collection.
• Step 1 arbitrarily selects ABC (plus sign), and marks ABD, BCD, and 10 others for deletion (minus sign) because their distance from ABC is less than 2.
• Step 2 arbitrarily selects ADE and marks 8 others for deletion because their distance from ADE is less than 2.
• Step 3 selects AFG and marks 4 for deletion.
• Step 4 selects BDF and marks 3 for deletion.
• Step 5 selects BEG and marks 1 for deletion.
• Step 6 selects CDG and marks 0 for deletion.
• Step 7 selects CDG. Nothing is undetermined, so the algorithm is complete.
greedy algorithm for DSC [7, 3, 2] | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
step 0 | ? A B C | ? A B D | ? A B E | ? A B F | ? A B G | ? A C D | ? A C E | ? A C F | ? A C G | ? A D E | ? A D F | ? A D G | ? A E F | ? A E G | ? A F G | ? B C D | ? B C E | ? B C F | ? B C G | ? B D E | ? B D F | ? B D G | ? B E F | ? B E G | ? B F G | ? C D E | ? C D F | ? C D G | ? C E F | ? C E G | ? C F G | ? D E F | ? D E G | ? D F G | ? E F G |
step 1 | + A B C | - A B D | - A B E | - A B F | - A B G | - A C D | - A C E | - A C F | - A C G | ? A D E | ? A D F | ? A D G | ? A E F | ? A E G | ? A F G | - B C D | - B C E | - B C F | - B C G | ? B D E | ? B D F | ? B D G | ? B E F | ? B E G | ? B F G | ? C D E | ? C D F | ? C D G | ? C E F | ? C E G | ? C F G | ? D E F | ? D E G | ? D F G | ? E F G |
step 2 | + A B C | + A D E | - A D F | - A D G | - A E F | - A E G | ? A F G | - B D E | ? B D F | ? B D G | ? B E F | ? B E G | ? B F G | - C D E | ? C D F | ? C D G | ? C E F | ? C E G | ? C F G | - D E F | - D E G | ? D F G | ? E F G | ||||||||||||
step 3 | + A B C | + A D E | + A F G | ? B D F | ? B D G | ? B E F | ? B E G | - B F G | ? C D F | ? C D G | ? C E F | ? C E G | - C F G | - D F G | - E F G | ||||||||||||||||||||
step 4 | + A B C | + A D E | + A F G | + B D F | - B D G | - B E F | ? B E G | - C D F | ? C D G | ? C E F | ? C E G | ||||||||||||||||||||||||
step 5 | + A B C | + A D E | + A F G | + B D F | + B E G | ? C D G | ? C E F | - C E G | |||||||||||||||||||||||||||
step 6 | + A B C | + A D E | + A F G | + B D F | + B E G | + C D G | ? C E F | ||||||||||||||||||||||||||||
step 7 | + A B C | + A D E | + A F G | + B D F | + B E G | + C D G | + C E F |
The final collection is ABC ADE AFG BDF BEG CDG CEF.
The next table shows all 30 collections, obtained by a non-greedy algorithm to be revealed in the next section:
# 1: ABC ADE AFG BDF BEG CDG CEF | #16: ABE ACF ADG BCG BDF CDE EFG | |
# 2: ABC ADE AFG BDG BEF CDF CEG | #17: ABE ACG ADF BCD BFG CEF DEG | |
# 3: ABC ADF AEG BDE BFG CDG CEF | #18: ABE ACG ADF BCF BDG CDE EFG | |
# 4: ABC ADF AEG BDG BEF CDE CFG | #19: ABF ACD AEG BCE BDG CFG DEF | |
# 5: ABC ADG AEF BDE BFG CDF CEG | #20: ABF ACD AEG BCG BDE CEF DFG | |
# 6: ABC ADG AEF BDF BEG CDE CFG | #21: ABF ACE ADG BCD BEG CFG DEF | |
# 7: ABD ACE AFG BCF BEG CDG DEF | #22: ABF ACE ADG BCG BDE CDF EFG | |
# 8: ABD ACE AFG BCG BEF CDF DEG | #23: ABF ACG ADE BCD BEG CEF DFG | |
# 9: ABD ACF AEG BCE BFG CDG DEF | #24: ABF ACG ADE BCE BDG CDF EFG | |
#10: ABD ACF AEG BCG BEF CDE DFG | #25: ABG ACD AEF BCE BDF CFG DEG | |
#11: ABD ACG AEF BCE BFG CDF DEG | #26: ABG ACD AEF BCF BDE CEG DFG | |
#12: ABD ACG AEF BCF BEG CDE DFG | #27: ABG ACE ADF BCD BEF CFG DEG | |
#13: ABE ACD AFG BCF BDG CEG DEF | #28: ABG ACE ADF BCF BDE CDG EFG | |
#14: ABE ACD AFG BCG BDF CEF DEG | #29: ABG ACF ADE BCD BEF CEG DFG | |
#15: ABE ACF ADG BCD BFG CEG DEF | #30: ABG ACF ADE BCE BDF CDG EFG |
Although the greedy algorithm turns out to produce a maximum-size collection for DSC [7, 3, 2], it comes up short with DSC [8, 3, 2] among others.
Section 4. There is an algorithm to find all solutions to the DSC [m, s, d] problem, and it uses recursion.
All the subsets of Z are placed in a list in arbitrary order. At each level of the recursion two branches are executed:
Whenever the recursion reaches bottom, a collection will have been produced. Particularly in the latter stages of computation, the collections generated will often be of less than maximum size and will be discarded. In fact, the final collection will contain precisely zero subsets. As an alternative, there might be some criterion that a highly efficient program could employ to detect when no further maximum-size collections can exist, and so to terminate early, saving time.
This algorithm can be characterized as a depth-first search of a binary tree.
In the case of DSC [8, 3, 2], this algorithm produces 18 collections of size 7 before it starts finding any collections of size 8, of which there are 840. Hence the results must be examined carefully.
Section 5. Here are examples of maximum-size collections for selected values of m, s, and d. Each is whatever appeared first in the search of section 4, above. The qty column is the total number of maximum-size collections.
DSC [m, s, d] | ||||
---|---|---|---|---|
m | s | d | qty | example |
5 | 3 | 2 | 15 | ABC ADE |
6 | 2 | 2 | 15 | AB CD EF |
3 | 2 | 30 | ABC ADE BDF CEF | |
3 | 10 | ABC DEF | ||
4 | 2 | 15 | ABCD ABEF CDEF | |
7 | 2 | 2 | 105 | AB CD EF |
3 | 2 | 30 | ABC ADE AFG BDF BEG CDG CEF | |
3 | 70 | ABC DEF | ||
4 | 35 | ABC | ||
4 | 2 | 30 | ABCD ABEF ACEG ADFG BCFG BDEG CDEF | |
3 | 70 | ABCD AEFG | ||
5 | 2 | 105 | ABCDE ABCFG ADEFG | |
3 | 21 | ABCDE | ||
8 | 2 | 2 | 105 | AB CD EF GH |
3 | 2 | 840 | ABC ADE AFG BDF BEH CEG CFH DGH | |
3 | 280 | ABC DEF | ||
4 | 2 | 30 | ABCD ABEF ABGH ACEG ACFH ADEH ADFG BCEH BCFG BDEG BDFH CDEF CDGH EFGH | |
3 | 595 | ABCD AEFG | ||
5 | 2 | 840 | ABCDE ABCFG ABDFH ACDGH AEFGH BCEFH BDEGH CDEFG | |
3 | 280 | ABCDE ABFGH | ||
6 | 2 | 105 | ABCDEF ABCDGH ABEFGH CDEFGH | |
9 | 3 | 2 | 840 | ABC ADE AFG AHI BDF BEH BGI CDI CEG CFH DGH EFI |
3 | 280 | ABC DEF GHI | ||
4 | 3 | 7560 | ABCD AEFG BEHI | |
4 | 315 | ABCD EFGH | ||
5 | 3 | 7560 | ABCDE ABFGH CDFGI | |
4 | 315 | ABCDE AFGHI | ||
6 | 3 | 280 | ABCDEF ABCGHI DEFGHI | |
7 | 2 | 945 | ABCDEFG ABCDEHI ABCFGHI ADEFGHI | |
10 | 4 | 3 | 30240 | ABCD AEFG BEHI CFHJ DGIJ |
5 | 3 | 60480 | ABCDE ABFGH ACFIJ BDGIJ CEGHI DEFHJ | |
4 | 3276 | ABCDE AFGHI | ||
6 | 3 | 30240 | ABCDEF ABCGHI ADEGHJ BDFGIJ CEFHIJ | |
4 | 1575 | ABCDEF ABGHIJ | ||
7 | 3 | 2800 | ABCDEFG ABCDHIJ AEFGHIJ | |
8 | 2 | 945 | ABCDEFGH ABCDEFIJ ABCDGHIJ ABEFGHIJ CDEFGHIJ |