Studies of Collisions During Node ID Alias Allocation

Study of 12 Bit Alias

There's a small test program in OpenLCB SVN for estimating the probability of collisions in identifiers:

prototypes/java/test/simulations/NodeIDCollisions.java

For example, if you assign random and independent 12-bits aliases to N nodes and then put them together, what fraction of the time will there be a collision?

Nodes

Net Collision Rate

Node Collision Rate

10

0.0109

0.001096

25

0.0706

0.00292

100

0.702

0.011980



The two "collision rate" numbers are the fraction of trials in which one or more collisions were observed on the network, and the fraction of the nodes that had a collision when the network first started up.

The first example shows that 70% of the time, when you put together a 100 node network, you'll have at least one collision in the network if you use this method of assigning IDs. Any particular node, though, has only a 1% chance of having a collision and needing to retry.

The number of nodes doesn't stop at 100, though. Since the protocol allows off-CAN-segment nodes to communicate, and you have to be able to address them, the numbers can continue to rise:



Nodes

Net Collision Rate

Node Collision Rate

250

0.99962

0.0297

1000

1.0

0.1126



Large modular layouts will have collisions, but even big ones won't require nodes to retry very often.

If IDs are sequentially, as opposed to randomly, assigned by a single source, the problem goes away. For example, if a manufacturer assigns serial numbers to boards, they won't conflict. But what if two manufacturers are involved? Does that make the problem better or worse?

We model this as picking N unique numbers from each of M manufacturers who have sequentially numbered L boards so far. What's the chance then? If the manufacturers have only made a small number of boards, e.g. 100, the chance of a collision becomes pretty large:

Nodes

Number of Vendors

Number Produced

Net Collision Rate

Node Collision Rate

10

2

100

0.6672

0.0997

5

5

100

0.9375

0.4757

5

20

100

1.0

(continuous)



With 100 boards produced by each of several vendors, collisions are very likely!

If they've produced a thousand boards, the chance goes down, but it's still significant:

Nodes

Number of Vendors

Number Produced

Net Collision Rate

Node Collision Rate

10

2

1000

0.0975

0.0102

5

5

1000

0.2247

0.0501

5

20

1000

0.9928

0.9256



These rates are quite high; sequential numbering without some form of extra information (e.g. a manufacturer number) causes high collision rates.

OpenLCB's use of a seeded PRNG greatly reduces the size of this problem.

Study of 16 Bit Alias

This section describes a study of the earlier 16-bit alias proposal.

There's a small test program in SVN for estimating the probability of collisions in identifiers:

prototypes/java/test/simulations/NodeIDCollisions.java

For example, if you assign random and independent 16-bits aliases to N nodes and then put them together, what fraction of the time will there be a collision?

Nodes

Net Collision Rate

Node Collision Rate

10

0.0006727

0.00006727

25

0.0046

0.00018452

100

0.0727

.00075333



The two "collision rate" numbers are the fraction of trials in which one or more collisions were observed on the network, and the fraction of the nodes that had a collision when the network first started up.

The first example shows that 7% of the time, when you put together a 100 node network, you'll have at least one collision in the network if you use this method of assigning IDs. Any particular node, though, has only a 0.07% chance of causing a problem.

The number of nodes doesn't stop at 100, though. Since the protocol allows off-CAN-segment nodes to communicate, and you have to be able to address them, the numbers can continue to rise:



Nodes

Net Collision Rate

Node Collision Rate

250

0.3793

0.0019

1000

0.9996

0.0076



Large modular layouts will have collisions.

If IDs are sequentially, as opposed to randomly, assigned by a single source, the problem goes away. For example, if a manufacturer assigns serial numbers to boards, they won't conflict. But what if two manufacturers are involved? Does that make the problem better or worse?

We model this as picking N unique numbers from each of M manufacturers who have sequentially numbered L boards so far. What's the chance then? If the manufacturers have only made a small number of boards, e.g. 100, the chance of a collision becomes pretty large:

Nodes

Number of Vendors

Number Produced

Net Collision Rate

Node Collision Rate

10

2

100

0.6672

0.0997

5

5

100

0.9375

0.4757

5

20

100

1.0

(continuous)



With 100 boards produced by each of several vendors, collisions are very likely!

If they've produced a thousand boards, the chance goes down, but it's still significant:

Nodes

Number of Vendors

Number Produced

Net Collision Rate

Node Collision Rate

10

2

1000

0.0975

0.0102

5

5

1000

0.2247

0.0501

5

20

1000

0.9928

0.9256



These rates are quite high; sequential numbering without some form of extra information (e.g. a manufacturer number) causes high collision rates.

Properly seeding the alias algorithm can greatly reduce this problem. For example, putting manufacturer information in the top byte of the alias, and serial number in the lower, reduces the collision rate to zero for up to 255 manufactured nodes. Even in the case of a few thousand, careful choice of packing can reduce the collision rate to close to (but not exactly) zero. The collision algorithm really only earns its keep when the network has caught on enough that tens of thousands of nodes are in use. That's not such a stretch, however, as there are tens of millions of DCC decoders in operation now, and the goal of the current networking proposals is to last for a similar length of time.




This is SVN $Revision: 213 $