Distant subsets.
Version of Thursday 7 February 2019.
Dave Barber's other pages.

Section 1. This report looks at finding a collection of equal-size subsets of a master set, each subset being "sufficiently distant" from all the others. Exactly what this means will be made clear as the explanation progresses.

We are dealing with small finite sets, and naïve set theory suffices.

Elements of the sets are represented by majuscules from the earlier part of the alphabet, such as A, B, and C. These elements cannot be assumed to have any properties, except that they exist and that distinct letters stand for assuredly unequal elements.

The names of subsets of the master set are majuscules from the latter part of the alphabet, such as W, X, and Y. An exception is Z, sometimes subscripted with its size, which denotes the master set itself. For example, U might equal {C, F, H}, which will usually be written as the more compact CFH when no confusion will result. A collection of subsets of Z9 = {A, B, C, D, E, F, G, H, I} = ABCDEFGHI might be written equivalently in any of the three following formats:

{{B, C, G}, {C, F, H}, {D, E, H}}
{BCG, CFH, DEH}
BCG CFH DEH

For standardization, everything is written in alphabetical order unless there is a reason not to.

The usual term for the quantity of elements in a set is cardinality, although the shorter term size is adequate here. If two sets are the same size, we define the distance between them to be the size of either set minus the size of their intersection; or equivalently, half the size of the symmetric difference between them. Examples:

If two sets are of different sizes, we do not define a distance.

A briefer way of saying "less distant from" is "closer to".


Section 2. Now it can be more precisely asked: Given a master set, what is the largest collection of its subsets, all of a specified size, such that the distance between any pair of subsets satisfies a specified minimum?

A concise notation is DSC [m, s, d], where:

In explaining the subject, it helps to start with a concrete example, namely DSC [7, 3, 2]: Given the master set Z7 = ABCDEFG, what is the largest collection of subsets of size 3 such that the distance between any pair of them is at least 2?

For reference, here is a table of all the 35 size-3 subsets of Z:

ABC        
ABD ACDBCD   
ABE ACE ADEBCE BDECDE  
ABF ACF ADF AEFBCF BDF BEFCDF CEFDEF 
ABG ACG ADG AEG AFGBCG BDG BEG BFGCDG CEG CFGDEG DFGEFG

The 30 solutions of DSC [7, 3, 2] are listed later, but here is one of them:

ABC ADE AFG BDF BEG CDG CEF

Note for instance that ABC and ABD cannot co-exist because the distance between them is only 1.

In all nontrivial cases, the DSC problems have multiple solutions, often related by permutation. For example, exchanging A and B and re-alphabetizing gives this:

ABC ADF AEG BDE BFG CDG CEF

Perhaps surprisingly, if E and F had been exchanged instead of A and B, the result would have turned out the same. This behavior helps to illustrate why there are 30 solutions, and not 7! = 5040.

Here is a recommendation for DSC [m, s, d]:


Section 3. There is a greedy algorithm that produces a collection, but not necessarily of maximum size. Here it is, applied to DSC [7, 3, 2]:

• Step 0 in the table below lists the 35 subsets, now written vertically to make the table more graphically manageable. Each subset is preceded by a question mark, because it is not yet determined which subsets will end up in the final collection.

• Step 1 arbitrarily selects ABC (plus sign), and marks ABD, BCD, and 10 others for deletion (minus sign) because their distance from ABC is less than 2.

• Step 2 arbitrarily selects ADE and marks 8 others for deletion because their distance from ADE is less than 2.

• Step 3 selects AFG and marks 4 for deletion.

• Step 4 selects BDF and marks 3 for deletion.

• Step 5 selects BEG and marks 1 for deletion.

• Step 6 selects CDG and marks 0 for deletion.

• Step 7 selects CDG. Nothing is undetermined, so the algorithm is complete.

 greedy algorithm for DSC [7, 3, 2]
step 0 ?
A
B
C
?
A
B
D
?
A
B
E
?
A
B
F
?
A
B
G
?
A
C
D
?
A
C
E
?
A
C
F
?
A
C
G
?
A
D
E
?
A
D
F
?
A
D
G
?
A
E
F
?
A
E
G
?
A
F
G
?
B
C
D
?
B
C
E
?
B
C
F
?
B
C
G
?
B
D
E
?
B
D
F
?
B
D
G
?
B
E
F
?
B
E
G
?
B
F
G
?
C
D
E
?
C
D
F
?
C
D
G
?
C
E
F
?
C
E
G
?
C
F
G
?
D
E
F
?
D
E
G
?
D
F
G
?
E
F
G
step 1 +
A
B
C
-
A
B
D
-
A
B
E
-
A
B
F
-
A
B
G
-
A
C
D
-
A
C
E
-
A
C
F
-
A
C
G
?
A
D
E
?
A
D
F
?
A
D
G
?
A
E
F
?
A
E
G
?
A
F
G
-
B
C
D
-
B
C
E
-
B
C
F
-
B
C
G
?
B
D
E
?
B
D
F
?
B
D
G
?
B
E
F
?
B
E
G
?
B
F
G
?
C
D
E
?
C
D
F
?
C
D
G
?
C
E
F
?
C
E
G
?
C
F
G
?
D
E
F
?
D
E
G
?
D
F
G
?
E
F
G
step 2 +
A
B
C
  +
A
D
E
-
A
D
F
-
A
D
G
-
A
E
F
-
A
E
G
?
A
F
G
  -
B
D
E
?
B
D
F
?
B
D
G
?
B
E
F
?
B
E
G
?
B
F
G
-
C
D
E
?
C
D
F
?
C
D
G
?
C
E
F
?
C
E
G
?
C
F
G
-
D
E
F
-
D
E
G
?
D
F
G
?
E
F
G
step 3 +
A
B
C
  +
A
D
E
  +
A
F
G
  ?
B
D
F
?
B
D
G
?
B
E
F
?
B
E
G
-
B
F
G
  ?
C
D
F
?
C
D
G
?
C
E
F
?
C
E
G
-
C
F
G
  -
D
F
G
-
E
F
G
step 4 +
A
B
C
  +
A
D
E
  +
A
F
G
  +
B
D
F
-
B
D
G
-
B
E
F
?
B
E
G
  -
C
D
F
?
C
D
G
?
C
E
F
?
C
E
G
 
step 5 +
A
B
C
  +
A
D
E
  +
A
F
G
  +
B
D
F
  +
B
E
G
  ?
C
D
G
?
C
E
F
-
C
E
G
 
step 6 +
A
B
C
  +
A
D
E
  +
A
F
G
  +
B
D
F
  +
B
E
G
  +
C
D
G
?
C
E
F
 
step 7 +
A
B
C
  +
A
D
E
  +
A
F
G
  +
B
D
F
  +
B
E
G
  +
C
D
G
+
C
E
F
 

The final collection is ABC ADE AFG BDF BEG CDG CEF.

The next table shows all 30 collections, obtained by a non-greedy algorithm to be revealed in the next section:

# 1: ABC ADE AFG BDF BEG CDG CEF  #16: ABE ACF ADG BCG BDF CDE EFG
# 2: ABC ADE AFG BDG BEF CDF CEG#17: ABE ACG ADF BCD BFG CEF DEG
# 3: ABC ADF AEG BDE BFG CDG CEF#18: ABE ACG ADF BCF BDG CDE EFG
# 4: ABC ADF AEG BDG BEF CDE CFG#19: ABF ACD AEG BCE BDG CFG DEF
# 5: ABC ADG AEF BDE BFG CDF CEG#20: ABF ACD AEG BCG BDE CEF DFG
# 6: ABC ADG AEF BDF BEG CDE CFG#21: ABF ACE ADG BCD BEG CFG DEF
# 7: ABD ACE AFG BCF BEG CDG DEF#22: ABF ACE ADG BCG BDE CDF EFG
# 8: ABD ACE AFG BCG BEF CDF DEG#23: ABF ACG ADE BCD BEG CEF DFG
# 9: ABD ACF AEG BCE BFG CDG DEF#24: ABF ACG ADE BCE BDG CDF EFG
#10: ABD ACF AEG BCG BEF CDE DFG#25: ABG ACD AEF BCE BDF CFG DEG
#11: ABD ACG AEF BCE BFG CDF DEG#26: ABG ACD AEF BCF BDE CEG DFG
#12: ABD ACG AEF BCF BEG CDE DFG#27: ABG ACE ADF BCD BEF CFG DEG
#13: ABE ACD AFG BCF BDG CEG DEF#28: ABG ACE ADF BCF BDE CDG EFG
#14: ABE ACD AFG BCG BDF CEF DEG#29: ABG ACF ADE BCD BEF CEG DFG
#15: ABE ACF ADG BCD BFG CEG DEF#30: ABG ACF ADE BCE BDF CDG EFG

Although the greedy algorithm turns out to produce a maximum-size collection for DSC [7, 3, 2], it comes up short with DSC [8, 3, 2] among others.


Section 4. There is an algorithm to find all solutions to the DSC [m, s, d] problem, and it uses recursion.

All the subsets of Z are placed in a list in arbitrary order. At each level of the recursion two branches are executed:

Whenever the recursion reaches bottom, a collection will have been produced. Particularly in the latter stages of computation, the collections generated will often be of less than maximum size and will be discarded. In fact, the final collection will contain precisely zero subsets. As an alternative, there might be some criterion that a highly efficient program could employ to detect when no further maximum-size collections can exist, and so to terminate early, saving time.

This algorithm can be characterized as a depth-first search of a binary tree.

In the case of DSC [8, 3, 2], this algorithm produces 18 collections of size 7 before it starts finding any collections of size 8, of which there are 840. Hence the results must be examined carefully.


Section 5. Here are examples of maximum-size collections for selected values of m, s, and d. Each is whatever appeared first in the search of section 4, above. The qty column is the total number of maximum-size collections.

DSC [m, s, d]
 m  s  d qtyexample
53215 ABC ADE
62215 AB CD EF
3230 ABC ADE BDF CEF
310 ABC DEF
4215 ABCD ABEF CDEF
722105 AB CD EF
3230 ABC ADE AFG BDF BEG CDG CEF
370 ABC DEF
435 ABC
4230 ABCD ABEF ACEG ADFG BCFG BDEG CDEF
370 ABCD AEFG
52105 ABCDE ABCFG ADEFG
321 ABCDE
822105 AB CD EF GH
32840 ABC ADE AFG BDF BEH CEG CFH DGH
3280 ABC DEF
4230 ABCD ABEF ABGH ACEG ACFH ADEH ADFG BCEH BCFG BDEG BDFH CDEF CDGH EFGH
3595 ABCD AEFG
52840 ABCDE ABCFG ABDFH ACDGH AEFGH BCEFH BDEGH CDEFG
3280 ABCDE ABFGH
62105 ABCDEF ABCDGH ABEFGH CDEFGH
932840 ABC ADE AFG AHI BDF BEH BGI CDI CEG CFH DGH EFI
3280 ABC DEF GHI
437560 ABCD AEFG BEHI
4315 ABCD EFGH
537560 ABCDE ABFGH CDFGI
4315 ABCDE AFGHI
63280 ABCDEF ABCGHI DEFGHI
72945 ABCDEFG ABCDEHI ABCFGHI ADEFGHI
104330240 ABCD AEFG BEHI CFHJ DGIJ
5360480 ABCDE ABFGH ACFIJ BDGIJ CEGHI DEFHJ
43276 ABCDE AFGHI
6330240 ABCDEF ABCGHI ADEGHJ BDFGIJ CEFHIJ
41575 ABCDEF ABGHIJ
732800 ABCDEFG ABCDHIJ AEFGHIJ
82945 ABCDEFGH ABCDEFIJ ABCDGHIJ ABEFGHIJ CDEFGHIJ