Table of contents
I went through the details of Arrays and LinkedLists previously and one of the significant difference between them is that inserting into the sequence is
O(n) for array, but it is
O(1) for linked list. I built a simple benchmark to show this difference and in the meantime I became more familiar of the
Imagine a situation where you have an array and other arrays needed to merged into it. Merge means in this case that:
an array is inserted middle of another array
an array is chained to another array
an array is added before another array
The task is to create a scalable solution for this problem using either array or linked list.
using array and add/insert the other array's elements one-by-one
using array and add/insert the other arrays
convert the main array to linked list and add the elements one-by-one from the other arrays
convert the arrays to linked lists and insert/add the other linked lists
create a custom linked list type, convert all arrays to linked lists and insert linked lists into linked lists
Used data types
Short evaluation of the solutions
Using array and add/insert other arrays' elements one-by-one
In case of variable size list which manages the size of its internal array for us,
List<T> provides this for us, meaning occasionally we run into a
O(n) operation when
List<T> increases its size. This is just the theoretical part, see for extra details below.
List<T>s default capacity is 4 and whenever it exceeds the capacity it doubles the capacity size. List uses
Array.Copy() for copying the values. So, we can consider this part of the operation as
However, the fact that we add the items one-by-one means that this operation is
O(n). This part slows down our solution and makes it difficult to scale.
Another issue is when
List<T> increases its internal array size. During this operation it will use minimum double size memory: the original array occupies memory, the new with increased size also occupies memory. This state stands until the copy finishes. Theoretically.
Array.Copy() implementation is
unsafe and uses
refs (you know the stuff they want to hide from you), so this issue is fully or almost fully mitigated.
This solution is
Using array and add/insert the other arrays
Compared to the previous solution it is way better as we do not iterate through the elements to be added one-by-one. Occasionally, we will see
O(n) (theoretically) operations when
List<T> increases its size, but the operation will be closer to
For size management concerns see the previous solution. Due to the dark magic
Array.Copy() does this solution is not
O(n), it is rather closer to
Convert the main array to linked lists and add the elements one-by-one from other arrays
This solution is based on the always guaranteed
O(1) inserts of
LinkedList<T>. Once we have the original array converted to linked list we can insert the other arrays' elements one-by-one, iterating through them. The fact we iterate through the other arrays and the other fact that the original array must be converted to linked list (also iterating through the array) makes this operation
O(n). Disclaimer, I didn't use the
LinkedList<T>.CopyTo(T array, int index), rather I wrote a for loop and let it go. Both are
O(n). But, compared to the array solutions it doesn't not contain occasionally array size increase and less pressure on memory due to that it doesn't built on contagious memory.
This solution is closer to
O(n) as only the main array has to be converted to
O(n) operation), but all the inserts are
O(1), however iterating through the input data is still
Convert the arrays to linked lists and add/insert them
This one sounds really good as inserting/adding linked list to linked list is
O(1) and no array size management. However,
C# implementation of
LinkedList<T> does not offer methods to chain linked lists or insert a linked list to another one. The only solution with provided data structures is adding elements one-by-one.
For this solution I had to create a custom linked list.
This operation is somewhere at the same place between
O(1) as the previous one since it also includes
linked list conversion.
I used BenchmarkDotNet for measuring the performance of the different solutions. This is the official tool for this in C#. It is also used by the .NET team for benchmarking their own code.
I was curious how performance changes when the input elements are increased. The changes were stored in a
Dictionary<int, int> where the key is the index of the changes and it is used to determine to which index the changes should be inserted. The value contains the array should be inserted.
The length of an array should be inserted is always 100. How many of these arrays we have was changed as you can see below.
[Params(100, 200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 10000, 15000, 20000)]
The whole result file is available here.
| Method | AmountOfChanges | SizeOfAppendedArrays | ListSize | Mean | Error | StdDev | Median | Min | Max | Allocated | |------------------------------- |---------------- |--------------------- |--------- |-----------------:|-----------------:|----------------:|-----------------:|-----------------:|-----------------:|-------------:| | AddChangesToLinkedListOneByOne | 100 | 100 | 10000 | 4.319 ms | 4.466 ms | 0.2448 ms | 4.320 ms | 4.074 ms | 4.563 ms | 480,003 B | | AddChangesToListOneByOne | 100 | 100 | 10000 | 1,497.844 ms | 4,501.847 ms | 246.7614 ms | 1,504.771 ms | 1,247.692 ms | 1,741.069 ms | 51 B | | AddChangesToListAsRanges | 100 | 100 | 10000 | 158.909 ms | 506.517 ms | 27.7639 ms | 156.082 ms | 132.667 ms | 187.978 ms | 13 B | | AddChangesToLLNodesAsRanges | 100 | 100 | 10000 | 3.191 ms | 3.157 ms | 0.1730 ms | 3.244 ms | 2.998 ms | 3.331 ms | 400,013 B | | AddChangesToLinkedListOneByOne | 1000 | 100 | 10000 | 18.636 ms | 33.578 ms | 1.8405 ms | 18.178 ms | 17.067 ms | 20.662 ms | 4,800,026 B | | AddChangesToListOneByOne | 1000 | 100 | 10000 | 15,204.707 ms | 68,839.457 ms | 3,773.3229 ms | 14,777.190 ms | 11,663.351 ms | 19,173.581 ms | 408 B | | AddChangesToListAsRanges | 1000 | 100 | 10000 | 4,595.169 ms | 13,106.219 ms | 718.3961 ms | 4,622.635 ms | 3,863.434 ms | 5,299.438 ms | 27 B | | AddChangesToLLNodesAsRanges | 1000 | 100 | 10000 | 11.090 ms | 9.332 ms | 0.5115 ms | 11.060 ms | 10.595 ms | 11.617 ms | 4,000,013 B | | AddChangesToLinkedListOneByOne | 10000 | 100 | 10000 | 1,057.627 ms | 15,838.929 ms | 868.1852 ms | 584.119 ms | 529.143 ms | 2,059.618 ms | 48,001,387 B | | AddChangesToListOneByOne | 10000 | 100 | 10000 | 1,677,680.943 ms | 8,106,651.224 ms | 444,352.9083 ms | 1,670,419.527 ms | 1,237,003.244 ms | 2,125,620.059 ms | 67,109,824 B | | AddChangesToListAsRanges | 10000 | 100 | 10000 | 167,542.256 ms | 962,901.976 ms | 52,779.9065 ms | 176,982.138 ms | 110,679.386 ms | 214,965.243 ms | 93 B | | AddChangesToLLNodesAsRanges | 10000 | 100 | 10000 | 172.065 ms | 1,819.660 ms | 99.7417 ms | 140.335 ms | 92.049 ms | 283.812 ms | 40,000,408 B |
The mentioned values below are
merging these arrays in the way the new values are inserted one-by-one is the worst
(100) 1,497.844 ms
(1000) 15,204.707 ms
(10 000) 1,677,680.943 ms
(20 000) 1,120,799.038 ms
using arrays and merging them as range is not as bad as I thought due to the dark magic (unsafe code refs) in the
(100) 158.909 ms
(1000) 4,595.169 ms
(10 000) 167,542.256 ms
(20 000) 172,870.537 ms
worth to convert the arrays to linked lists and inserting the values into it, and also worth to use
(100) 4.319 ms
(1000) 18.636 ms
(10 000) 1,057.627 ms
(20 000) 581.525 ms
in this scenario it seems doesn't worth to create a custom data type for this
(100) 3.191 ms
(1000) 11.090 ms
(10 000) 172.065 ms
(20 000) 663.092 ms
Originally I was planning to test only the insert for both array and linked list, but the final implementation includes transforming from array to linked list the incoming data. Which gives an additional disadvantage to the last solution. In real life situation this is the point when you have to consider how to manage this problem. You may deal with it around the insertion, meaning on the same machine or in the same process. Or you consider sourcing out this problem to another machine or service which deals with the transformation and you'll deal with only the insertion. Or, if you are lucky, you can discuss with the source team/solution to provide the data in another form than array.
The interesting thing is that the custom linked list always performs better except when
20 000 changes are inserted. This is also something worth to check why.
What is not really pushed in this test is where the new values are inserted. If you check the code you'll see that the insertion index slowly steps ahead, and when we reach the end of the array then it
breaks out from the loop. If the code has to go back and forth a lot the performance characteristics of the solutions are changing as moving to an index is
O(1) in case of arrays and
O(n) in case of linked lists.
So, even when you take a step away from the strictly atomic problem you'll have a lot to consider in terms of performance.
Putting this test together was a good exercise, and the next will be a test really focuses only the insert.
I would really appreciate if you share your review with me. Thank you!