Algorithm for N-way merge












76















A 2-way merge is widely studied as a part of Mergesort algorithm.
But I am interested to find out the best way one can perform an N-way merge?



Lets say, I have N files which have sorted 1 million integers each.
I have to merge them into 1 single file which will have those 100 million sorted integers.



Please keep in mind that use case for this problem is actually external sorting which is disk based. Therefore, in real scenarios there would be memory limitation as well. So a naive approach of merging 2 files at a time (99 times) won't work. Lets say we have only a small sliding window of memory available for each array.



I am not sure if there is already a standardized solution to this N-way merge. (Googling didn't tell me much).



But if you know if a good n-way merge algorithm, please post algo/link.



Time complexity: If we greatly increase the number of files (N) to be merged, how would that affect the time complexity of your algorithm?



Thanks for your answers.



I haven't been asked this anywhere, but I felt this could be an interesting interview question. Therefore tagged.










share|improve this question























  • Why won't the pairwise merge of files work?

    – user97370
    Feb 20 '11 at 8:03






  • 7





    For the record, this is known as balance line or k-way merge. Balance line algorithms usually have O(kn) time complexity where k is the number of files and n is the total number of items, whereas heap k-way merges are usually O(n log k). Both algorithms have O(k) space complexity.

    – user557219
    Feb 20 '11 at 8:07













  • @paul, ok I must admit that I was wrong when I said that merging 2 files at a time won't work due to space considerations. But in I think it would be a very slow algo, that is why it won't work.

    – bits
    Feb 20 '11 at 8:17











  • @Bavarious can you say why you think merging like this is O(kn). It seems to me like it's O(n log k) (since merging each pair is O(n) and you have to do it O(log k) times to reduce down to a single file). It uses only O(1) space too.

    – user97370
    Feb 20 '11 at 9:06











  • @PaulHankin Balance line keeps an unsorted array (instead of a heap) with the last key read from each input file, hence k in both O(kn) and O(k). It is good enough for small k.

    – user557219
    Feb 20 '11 at 9:11


















76















A 2-way merge is widely studied as a part of Mergesort algorithm.
But I am interested to find out the best way one can perform an N-way merge?



Lets say, I have N files which have sorted 1 million integers each.
I have to merge them into 1 single file which will have those 100 million sorted integers.



Please keep in mind that use case for this problem is actually external sorting which is disk based. Therefore, in real scenarios there would be memory limitation as well. So a naive approach of merging 2 files at a time (99 times) won't work. Lets say we have only a small sliding window of memory available for each array.



I am not sure if there is already a standardized solution to this N-way merge. (Googling didn't tell me much).



But if you know if a good n-way merge algorithm, please post algo/link.



Time complexity: If we greatly increase the number of files (N) to be merged, how would that affect the time complexity of your algorithm?



Thanks for your answers.



I haven't been asked this anywhere, but I felt this could be an interesting interview question. Therefore tagged.










share|improve this question























  • Why won't the pairwise merge of files work?

    – user97370
    Feb 20 '11 at 8:03






  • 7





    For the record, this is known as balance line or k-way merge. Balance line algorithms usually have O(kn) time complexity where k is the number of files and n is the total number of items, whereas heap k-way merges are usually O(n log k). Both algorithms have O(k) space complexity.

    – user557219
    Feb 20 '11 at 8:07













  • @paul, ok I must admit that I was wrong when I said that merging 2 files at a time won't work due to space considerations. But in I think it would be a very slow algo, that is why it won't work.

    – bits
    Feb 20 '11 at 8:17











  • @Bavarious can you say why you think merging like this is O(kn). It seems to me like it's O(n log k) (since merging each pair is O(n) and you have to do it O(log k) times to reduce down to a single file). It uses only O(1) space too.

    – user97370
    Feb 20 '11 at 9:06











  • @PaulHankin Balance line keeps an unsorted array (instead of a heap) with the last key read from each input file, hence k in both O(kn) and O(k). It is good enough for small k.

    – user557219
    Feb 20 '11 at 9:11
















76












76








76


56






A 2-way merge is widely studied as a part of Mergesort algorithm.
But I am interested to find out the best way one can perform an N-way merge?



Lets say, I have N files which have sorted 1 million integers each.
I have to merge them into 1 single file which will have those 100 million sorted integers.



Please keep in mind that use case for this problem is actually external sorting which is disk based. Therefore, in real scenarios there would be memory limitation as well. So a naive approach of merging 2 files at a time (99 times) won't work. Lets say we have only a small sliding window of memory available for each array.



I am not sure if there is already a standardized solution to this N-way merge. (Googling didn't tell me much).



But if you know if a good n-way merge algorithm, please post algo/link.



Time complexity: If we greatly increase the number of files (N) to be merged, how would that affect the time complexity of your algorithm?



Thanks for your answers.



I haven't been asked this anywhere, but I felt this could be an interesting interview question. Therefore tagged.










share|improve this question














A 2-way merge is widely studied as a part of Mergesort algorithm.
But I am interested to find out the best way one can perform an N-way merge?



Lets say, I have N files which have sorted 1 million integers each.
I have to merge them into 1 single file which will have those 100 million sorted integers.



Please keep in mind that use case for this problem is actually external sorting which is disk based. Therefore, in real scenarios there would be memory limitation as well. So a naive approach of merging 2 files at a time (99 times) won't work. Lets say we have only a small sliding window of memory available for each array.



I am not sure if there is already a standardized solution to this N-way merge. (Googling didn't tell me much).



But if you know if a good n-way merge algorithm, please post algo/link.



Time complexity: If we greatly increase the number of files (N) to be merged, how would that affect the time complexity of your algorithm?



Thanks for your answers.



I haven't been asked this anywhere, but I felt this could be an interesting interview question. Therefore tagged.







algorithm merge






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Feb 20 '11 at 7:55









bitsbits

4,25963752




4,25963752













  • Why won't the pairwise merge of files work?

    – user97370
    Feb 20 '11 at 8:03






  • 7





    For the record, this is known as balance line or k-way merge. Balance line algorithms usually have O(kn) time complexity where k is the number of files and n is the total number of items, whereas heap k-way merges are usually O(n log k). Both algorithms have O(k) space complexity.

    – user557219
    Feb 20 '11 at 8:07













  • @paul, ok I must admit that I was wrong when I said that merging 2 files at a time won't work due to space considerations. But in I think it would be a very slow algo, that is why it won't work.

    – bits
    Feb 20 '11 at 8:17











  • @Bavarious can you say why you think merging like this is O(kn). It seems to me like it's O(n log k) (since merging each pair is O(n) and you have to do it O(log k) times to reduce down to a single file). It uses only O(1) space too.

    – user97370
    Feb 20 '11 at 9:06











  • @PaulHankin Balance line keeps an unsorted array (instead of a heap) with the last key read from each input file, hence k in both O(kn) and O(k). It is good enough for small k.

    – user557219
    Feb 20 '11 at 9:11





















  • Why won't the pairwise merge of files work?

    – user97370
    Feb 20 '11 at 8:03






  • 7





    For the record, this is known as balance line or k-way merge. Balance line algorithms usually have O(kn) time complexity where k is the number of files and n is the total number of items, whereas heap k-way merges are usually O(n log k). Both algorithms have O(k) space complexity.

    – user557219
    Feb 20 '11 at 8:07













  • @paul, ok I must admit that I was wrong when I said that merging 2 files at a time won't work due to space considerations. But in I think it would be a very slow algo, that is why it won't work.

    – bits
    Feb 20 '11 at 8:17











  • @Bavarious can you say why you think merging like this is O(kn). It seems to me like it's O(n log k) (since merging each pair is O(n) and you have to do it O(log k) times to reduce down to a single file). It uses only O(1) space too.

    – user97370
    Feb 20 '11 at 9:06











  • @PaulHankin Balance line keeps an unsorted array (instead of a heap) with the last key read from each input file, hence k in both O(kn) and O(k). It is good enough for small k.

    – user557219
    Feb 20 '11 at 9:11



















Why won't the pairwise merge of files work?

– user97370
Feb 20 '11 at 8:03





Why won't the pairwise merge of files work?

– user97370
Feb 20 '11 at 8:03




7




7





For the record, this is known as balance line or k-way merge. Balance line algorithms usually have O(kn) time complexity where k is the number of files and n is the total number of items, whereas heap k-way merges are usually O(n log k). Both algorithms have O(k) space complexity.

– user557219
Feb 20 '11 at 8:07







For the record, this is known as balance line or k-way merge. Balance line algorithms usually have O(kn) time complexity where k is the number of files and n is the total number of items, whereas heap k-way merges are usually O(n log k). Both algorithms have O(k) space complexity.

– user557219
Feb 20 '11 at 8:07















@paul, ok I must admit that I was wrong when I said that merging 2 files at a time won't work due to space considerations. But in I think it would be a very slow algo, that is why it won't work.

– bits
Feb 20 '11 at 8:17





@paul, ok I must admit that I was wrong when I said that merging 2 files at a time won't work due to space considerations. But in I think it would be a very slow algo, that is why it won't work.

– bits
Feb 20 '11 at 8:17













@Bavarious can you say why you think merging like this is O(kn). It seems to me like it's O(n log k) (since merging each pair is O(n) and you have to do it O(log k) times to reduce down to a single file). It uses only O(1) space too.

– user97370
Feb 20 '11 at 9:06





@Bavarious can you say why you think merging like this is O(kn). It seems to me like it's O(n log k) (since merging each pair is O(n) and you have to do it O(log k) times to reduce down to a single file). It uses only O(1) space too.

– user97370
Feb 20 '11 at 9:06













@PaulHankin Balance line keeps an unsorted array (instead of a heap) with the last key read from each input file, hence k in both O(kn) and O(k). It is good enough for small k.

– user557219
Feb 20 '11 at 9:11







@PaulHankin Balance line keeps an unsorted array (instead of a heap) with the last key read from each input file, hence k in both O(kn) and O(k). It is good enough for small k.

– user557219
Feb 20 '11 at 9:11














8 Answers
8






active

oldest

votes


















78














How about the following idea:




  1. Create a priority queue



  2. Iterate through each file f


    1. enqueue the pair (nextNumberIn(f), f) using the first value as priority key





  3. While queue not empty


    1. dequeue head (m, f) of queue

    2. output m

    3. if f not depleted


      1. enqueue (nextNumberIn(f), f)






Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.



Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).






share|improve this answer


























  • Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

    – bits
    Feb 20 '11 at 8:45






  • 1





    @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

    – Programmer
    Jul 31 '11 at 1:40








  • 7





    "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

    – aioobe
    Jul 31 '11 at 6:38








  • 2





    @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

    – enobayram
    Jul 6 '13 at 7:25








  • 1





    @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

    – aioobe
    May 21 '17 at 13:50



















12














Search for "Polyphase merge", check out classics - Donald Knuth & E.H.Friend.



Also, you may want to take a look at the proposed Smart Block Merging by Seyedafsari & Hasanzadeh, that, similarly to earlier suggestions, uses priority queues.



Another interesting reasonsing is In Place Merging Algorithm by Kim & Kutzner.



I also recommend this paper by Vitter: External memory algorithms and data structures: dealing with massive data.






share|improve this answer





















  • 2





    Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

    – Jeremy Salwen
    May 29 '12 at 5:58











  • Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

    – Grigori Melnik
    May 29 '12 at 7:17








  • 1





    This is the new link: waset.org/publications/9638/…

    – Hengameh
    Jun 20 '15 at 1:56



















6














One simple idea is to keep a priority queue of the ranges to merge, stored in such a way that the range with the smallest first element is removed first from the queue. You can then do an N-way merge as follows:




  1. Insert all of the ranges into the priority queue, excluding empty ranges.

  2. While the priority queue is not empty:

    1. Dequeue the smallest element from the queue.

    2. Append the first element of this range to the output sequence.

    3. If it's nonempty, insert the rest of the sequence back into the priority queue.




The correctness of this algorithm is essentially a generalization of the proof that a 2-way merge works correctly - if you always add the smallest element from any range, and all the ranges are sorted, you end up with the sequence as a whole sorted.



The runtime complexity of this algorithm can be found as follows. Let M be the total number of elements in all the sequences. If we use a binary heap, then we do at most O(M) insertions and O(M) deletions from the priority queue, since for each element written to the output sequence there's a dequeue to pull out the smallest sequence, followed by an enqueue to put the rest of the sequence back into the queue. Each of these steps takes O(lg N) operations, because insertion or deletion from a binary heap with N elements in it takes O(lg N) time. This gives a net runtime of O(M lg N), which grows less than linearly with the number of input sequences.



There may be a way to get this even faster, but this seems like a pretty good solution. The memory usage is O(N) because we need O(N) overhead for the binary heap. If we implement the binary heap by storing pointers to the sequences rather than the sequences themselves, this shouldn't be too much of a problem unless you have a truly ridiculous number of sequences to merge. In that case, just merge them in groups that do fit into memory, then merge all the results.



Hope this helps!






share|improve this answer































    2














    A simple approach to Merging k sorted arrays (each of length n) requires O(n k^2) time and not O(nk) time. As when you merge first 2 arrays it takes 2n time, then when you merge third with the output , it takes 3n time as now we are merging two array of length 2n and n. Now when we merge this output with the fourth one,this merge requires 4n time.Thus the last merge (when we are adding the kth array to our already sorted array ) requires k*n time.Thus total time required is 2n+ 3n + 4n +...k*n which is O(n k^2).



    It looks like we can do it in O(kn) time but it is not so because each time our array which we are merging is increasing in size.

    Though we can achieve a better bound using divide and conquer. I am still working on that and post a solution if I find one.






    share|improve this answer































      1














      See http://en.wikipedia.org/wiki/External_sorting. Here is my take on the heap based k-way merge, using a buffered read from the sources to emulate I/O reduction:



      public class KWayMerger<T>
      {
      private readonly IList<T> _sources;
      private readonly int _bufferSize;
      private readonly MinHeap<MergeValue<T>> _mergeHeap;
      private readonly int _indices;

      public KWayMerger(IList<T> sources, int bufferSize, Comparer<T> comparer = null)
      {
      if (sources == null) throw new ArgumentNullException("sources");

      _sources = sources;
      _bufferSize = bufferSize;

      _mergeHeap = new MinHeap<MergeValue<T>>(
      new MergeComparer<T>(comparer ?? Comparer<T>.Default));
      _indices = new int[sources.Count];
      }

      public T Merge()
      {
      for (int i = 0; i <= _sources.Count - 1; i++)
      AddToMergeHeap(i);

      var merged = new T[_sources.Sum(s => s.Length)];
      int mergeIndex = 0;

      while (_mergeHeap.Count > 0)
      {
      var min = _mergeHeap.ExtractDominating();
      merged[mergeIndex++] = min.Value;
      if (min.Source != -1) //the last item of the source was extracted
      AddToMergeHeap(min.Source);
      }

      return merged;
      }

      private void AddToMergeHeap(int sourceIndex)
      {
      var source = _sources[sourceIndex];
      var start = _indices[sourceIndex];
      var end = Math.Min(start + _bufferSize - 1, source.Length - 1);

      if (start > source.Length - 1)
      return; //we're done with this source

      for (int i = start; i <= end - 1; i++)
      _mergeHeap.Add(new MergeValue<T>(-1, source[i]));

      //only the last item should trigger the next buffered read
      _mergeHeap.Add(new MergeValue<T>(sourceIndex, source[end]));

      _indices[sourceIndex] += _bufferSize; //we may have added less items,
      //but if we did we've reached the end of the source so it doesn't matter
      }
      }

      internal class MergeValue<T>
      {
      public int Source { get; private set; }
      public T Value { get; private set; }

      public MergeValue(int source, T value)
      {
      Value = value;
      Source = source;
      }
      }

      internal class MergeComparer<T> : IComparer<MergeValue<T>>
      {
      public Comparer<T> Comparer { get; private set; }

      public MergeComparer(Comparer<T> comparer)
      {
      if (comparer == null) throw new ArgumentNullException("comparer");
      Comparer = comparer;
      }

      public int Compare(MergeValue<T> x, MergeValue<T> y)
      {
      Debug.Assert(x != null && y != null);
      return Comparer.Compare(x.Value, y.Value);
      }
      }


      Here is one possible implementation of MinHeap<T>. Some tests:



      [TestMethod]
      public void TestKWaySort()
      {
      var rand = new Random();
      for (int i = 0; i < 10; i++)
      AssertKwayMerge(rand);
      }

      private static void AssertKwayMerge(Random rand)
      {
      var sources = new
      {
      GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
      GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
      GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
      GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
      };
      Assert.IsTrue(new KWayMerger<int>(sources, 20).Merge().SequenceEqual(sources.SelectMany(s => s).OrderBy(i => i)));
      }

      public static IEnumerable<int> GenerateRandomCollection(Random rand, int minLength, int maxLength, int min = 0, int max = int.MaxValue)
      {
      return Enumerable.Repeat(0, rand.Next(minLength, maxLength)).Select(i => rand.Next(min, max));
      }





      share|improve this answer


























      • What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

        – Hengameh
        Jun 20 '15 at 2:02






      • 1





        @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

        – Ohad Schneider
        Jun 20 '15 at 7:11



















      1














      I wrote this STL-style piece of code that does N-way merge and thought I'd post it here to help prevent others from reinventing the wheel. :)



      Warning: it's only mildly tested. Test before use. :)



      You can use it like this:



      #include <vector>

      int main()
      {
      std::vector<std::vector<int> > v;
      std::vector<std::vector<int>::iterator> vout;
      std::vector<int> v1;
      std::vector<int> v2;
      v1.push_back(1);
      v1.push_back(2);
      v1.push_back(3);
      v2.push_back(0);
      v2.push_back(1);
      v2.push_back(2);
      v.push_back(v1);
      v.push_back(v2);
      multiway_merge(v.begin(), v.end(), std::back_inserter(vout), false);
      }


      It also allows using pairs of iterators instead of the containers themselves.



      If you use Boost.Range, you can remove some of the boilerplate code.



      The code:



      #include <algorithm>
      #include <functional> // std::less
      #include <iterator>
      #include <queue> // std::priority_queue
      #include <utility> // std::pair
      #include <vector>

      template<class OutIt>
      struct multiway_merge_value_insert_iterator : public std::iterator<
      std::output_iterator_tag, OutIt, ptrdiff_t
      >
      {
      OutIt it;
      multiway_merge_value_insert_iterator(OutIt const it = OutIt())
      : it(it) { }

      multiway_merge_value_insert_iterator &operator++(int)
      { return *this; }

      multiway_merge_value_insert_iterator &operator++()
      { return *this; }

      multiway_merge_value_insert_iterator &operator *()
      { return *this; }

      template<class It>
      multiway_merge_value_insert_iterator &operator =(It const i)
      {
      *this->it = *i;
      ++this->it;
      return *this;
      }
      };

      template<class OutIt>
      multiway_merge_value_insert_iterator<OutIt>
      multiway_merge_value_inserter(OutIt const it)
      { return multiway_merge_value_insert_iterator<OutIt>(it); };

      template<class Less>
      struct multiway_merge_value_less : private Less
      {
      multiway_merge_value_less(Less const &less) : Less(less) { }
      template<class It1, class It2>
      bool operator()(
      std::pair<It1, It1> const &b /* inverted */,
      std::pair<It2, It2> const &a) const
      {
      return b.first != b.second && (
      a.first == a.second ||
      this->Less::operator()(*a.first, *b.first));
      }
      };

      struct multiway_merge_default_less
      {
      template<class T>
      bool operator()(T const &a, T const &b) const
      { return std::less<T>()(a, b); }
      };

      template<class R>
      struct multiway_merge_range_iterator
      { typedef typename R::iterator type; };

      template<class R>
      struct multiway_merge_range_iterator<R const>
      { typedef typename R::const_iterator type; };

      template<class It>
      struct multiway_merge_range_iterator<std::pair<It, It> >
      { typedef It type; };

      template<class R>
      typename R::iterator multiway_merge_range_begin(R &r)
      { return r.begin(); }

      template<class R>
      typename R::iterator multiway_merge_range_end(R &r)
      { return r.end(); }

      template<class R>
      typename R::const_iterator multiway_merge_range_begin(R const &r)
      { return r.begin(); }

      template<class R>
      typename R::const_iterator multiway_merge_range_end(R const &r)
      { return r.end(); }

      template<class It>
      It multiway_merge_range_begin(std::pair<It, It> const &r)
      { return r.first; }

      template<class It>
      It multiway_merge_range_end(std::pair<It, It> const &r)
      { return r.second; }

      template<class It, class OutIt, class Less, class PQ>
      OutIt multiway_merge(
      It begin, It const end, OutIt out, Less const &less,
      PQ &pq, bool const distinct = false)
      {
      while (begin != end)
      {
      pq.push(typename PQ::value_type(
      multiway_merge_range_begin(*begin),
      multiway_merge_range_end(*begin)));
      ++begin;
      }
      while (!pq.empty())
      {
      typename PQ::value_type top = pq.top();
      pq.pop();
      if (top.first != top.second)
      {
      while (!pq.empty() && pq.top().first == pq.top().second)
      { pq.pop(); }
      if (!distinct ||
      pq.empty() ||
      less(*pq.top().first, *top.first) ||
      less(*top.first, *pq.top().first))
      {
      *out = top.first;
      ++out;
      }

      ++top.first;
      pq.push(top);
      }
      }
      return out;
      }

      template<class It, class OutIt, class Less>
      OutIt multiway_merge(
      It const begin, It const end, OutIt out, Less const &less,
      bool const distinct = false)
      {
      typedef typename multiway_merge_range_iterator<
      typename std::iterator_traits<It>::value_type
      >::type SubIt;
      if (std::distance(begin, end) < 16)
      {
      typedef std::vector<std::pair<SubIt, SubIt> > Remaining;
      Remaining remaining;
      remaining.reserve(
      static_cast<size_t>(std::distance(begin, end)));
      for (It i = begin; i != end; ++i)
      {
      if (multiway_merge_range_begin(*i) !=
      multiway_merge_range_end(*i))
      {
      remaining.push_back(std::make_pair(
      multiway_merge_range_begin(*i),
      multiway_merge_range_end(*i)));
      }
      }
      while (!remaining.empty())
      {
      typename Remaining::iterator smallest =
      remaining.begin();
      for (typename Remaining::iterator
      i = remaining.begin();
      i != remaining.end();
      )
      {
      if (less(*i->first, *smallest->first))
      {
      smallest = i;
      ++i;
      }
      else if (distinct && i != smallest &&
      !less(
      *smallest->first,
      *i->first))
      {
      i = remaining.erase(i);
      }
      else { ++i; }
      }
      *out = smallest->first;
      ++out;
      ++smallest->first;
      if (smallest->first == smallest->second)
      { smallest = remaining.erase(smallest); }
      }
      return out;
      }
      else
      {
      std::priority_queue<
      std::pair<SubIt, SubIt>,
      std::vector<std::pair<SubIt, SubIt> >,
      multiway_merge_value_less<Less>
      > q((multiway_merge_value_less<Less>(less)));
      return multiway_merge(begin, end, out, less, q, distinct);
      }
      }

      template<class It, class OutIt>
      OutIt multiway_merge(
      It const begin, It const end, OutIt const out,
      bool const distinct = false)
      {
      return multiway_merge(
      begin, end, out,
      multiway_merge_default_less(), distinct);
      }





      share|improve this answer

































        1














        Here is my implementation using MinHeap...

        package merging;

        import java.io.BufferedReader;
        import java.io.BufferedWriter;
        import java.io.File;
        import java.io.FileReader;
        import java.io.FileWriter;
        import java.io.IOException;
        import java.io.PrintWriter;


        public class N_Way_Merge {

        int No_of_files=0;
        String listString;
        int listIndex;
        PrintWriter pw;
        private String fileDir = "D:\XMLParsing_Files\Extracted_Data";
        private File fileList;
        private BufferedReader readers;

        public static void main(String args) throws IOException {

        N_Way_Merge nwm=new N_Way_Merge();

        long start= System.currentTimeMillis();

        try {

        nwm.createFileList();

        nwm.createReaders();
        nwm.createMinHeap();
        }
        finally {
        nwm.pw.flush();
        nwm.pw.close();
        for (BufferedReader readers : nwm.readers) {

        readers.close();

        }
        }
        long end = System.currentTimeMillis();
        System.out.println("Files merged into a single file.nTime taken: "+((end-start)/1000)+"secs");
        }

        public void createFileList() throws IOException {
        //creates a list of sorted files present in a particular directory
        File folder = new File(fileDir);
        fileList = folder.listFiles();
        No_of_files=fileList.length;
        assign();
        System.out.println("No. of files - "+ No_of_files);

        }

        public void assign() throws IOException
        {
        listString = new String[No_of_files];
        listIndex = new int[No_of_files];
        pw = new PrintWriter(new BufferedWriter(new FileWriter("D:\XMLParsing_Files\Final.txt", true)));
        }

        public void createReaders() throws IOException {
        //creates array of BufferedReaders to read the files
        readers = new BufferedReader[No_of_files];

        for(int i=0;i<No_of_files;++i)
        {
        readers[i]=new BufferedReader(new FileReader(fileList[i]));
        }
        }

        public void createMinHeap() throws IOException {

        for(int i=0;i<No_of_files;i++)
        {
        listString[i]=readers[i].readLine();
        listIndex[i]=i;
        }

        WriteToFile(listString,listIndex);

        }

        public void WriteToFile(String listString,int listIndex) throws IOException{

        BuildHeap_forFirstTime(listString, listIndex);
        while(!(listString[0].equals("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz")))
        {
        pw.println(listString[0]);
        listString[0]=readers[listIndex[0]].readLine();

        MinHeapify(listString,listIndex,0);
        }

        }
        public void BuildHeap_forFirstTime(String listString,int listIndex){

        for(int i=(No_of_files/2)-1;i>=0;--i)
        MinHeapify(listString,listIndex,i);

        }

        public void MinHeapify(String listString,int listIndex,int index){

        int left=index*2 + 1;
        int right=left + 1;
        int smallest=index;
        int HeapSize=No_of_files;
        if(left <= HeapSize-1 && listString[left]!=null && (listString[left].compareTo(listString[index])) < 0)
        smallest = left;

        if(right <= HeapSize-1 && listString[right]!=null && (listString[right].compareTo(listString[smallest])) < 0)
        smallest=right;



        if(smallest!=index)
        {
        String temp=listString[index];
        listString[index]=listString[smallest];
        listString[smallest]=temp;

        listIndex[smallest]^=listIndex[index];
        listIndex[index]^=listIndex[smallest];
        listIndex[smallest]^=listIndex[index];

        MinHeapify(listString,listIndex,smallest);
        }

        }


        }






        share|improve this answer































          0














          Java implementation of min heap algorithm for merging k sorted arrays:



          public class MergeKSorted {

          /**
          * helper object to store min value of each array in a priority queue,
          * the kth array and the index into kth array
          *
          */
          static class PQNode implements Comparable<PQNode>{
          int value;
          int kth = 0;
          int indexKth = 0;

          public PQNode(int value, int kth, int indexKth) {
          this.value = value;
          this.kth = kth;
          this.indexKth = indexKth;
          }
          @Override
          public int compareTo(PQNode o) {
          if(o != null) {
          return Integer.valueOf(value).compareTo(Integer.valueOf(o.value));
          }
          else return 0;
          }

          @Override
          public String toString() {
          return value+" "+kth+" "+indexKth;
          }
          }
          public static void mergeKSorted(int sortedArrays) {
          int k = sortedArrays.length;
          int resultCtr = 0;
          int totalSize = 0;
          PriorityQueue<PQNode> pq = new PriorityQueue<>();
          for(int i=0; i<k; i++) {
          int kthArray = sortedArrays[i];
          totalSize+=kthArray.length;
          if(kthArray.length > 0) {
          PQNode temp = new PQNode(kthArray[0], i, 0);
          pq.add(temp);
          }
          }
          int result = new int[totalSize];
          while(!pq.isEmpty()) {
          PQNode temp = pq.poll();
          int kthArray = sortedArrays[temp.kth];
          result[resultCtr] = temp.value;
          resultCtr++;
          temp.indexKth++;
          if(temp.indexKth < kthArray.length) {
          temp = new PQNode(kthArray[temp.indexKth], temp.kth, temp.indexKth);
          pq.add(temp);
          }

          }
          print(result);
          }

          public static void print(int a) {
          StringBuilder sb = new StringBuilder();
          for(int v : a) {
          sb.append(v).append(" ");
          }
          System.out.println(sb);
          }

          public static void main(String args) {
          int sortedA = {
          {3,4,6,9},
          {4,6,8,9,12},
          {3,4,9},
          {1,4,9}
          };
          mergeKSorted(sortedA);
          }

          }





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5055909%2falgorithm-for-n-way-merge%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            8 Answers
            8






            active

            oldest

            votes








            8 Answers
            8






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            78














            How about the following idea:




            1. Create a priority queue



            2. Iterate through each file f


              1. enqueue the pair (nextNumberIn(f), f) using the first value as priority key





            3. While queue not empty


              1. dequeue head (m, f) of queue

              2. output m

              3. if f not depleted


                1. enqueue (nextNumberIn(f), f)






            Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.



            Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).






            share|improve this answer


























            • Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

              – bits
              Feb 20 '11 at 8:45






            • 1





              @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

              – Programmer
              Jul 31 '11 at 1:40








            • 7





              "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

              – aioobe
              Jul 31 '11 at 6:38








            • 2





              @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

              – enobayram
              Jul 6 '13 at 7:25








            • 1





              @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

              – aioobe
              May 21 '17 at 13:50
















            78














            How about the following idea:




            1. Create a priority queue



            2. Iterate through each file f


              1. enqueue the pair (nextNumberIn(f), f) using the first value as priority key





            3. While queue not empty


              1. dequeue head (m, f) of queue

              2. output m

              3. if f not depleted


                1. enqueue (nextNumberIn(f), f)






            Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.



            Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).






            share|improve this answer


























            • Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

              – bits
              Feb 20 '11 at 8:45






            • 1





              @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

              – Programmer
              Jul 31 '11 at 1:40








            • 7





              "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

              – aioobe
              Jul 31 '11 at 6:38








            • 2





              @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

              – enobayram
              Jul 6 '13 at 7:25








            • 1





              @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

              – aioobe
              May 21 '17 at 13:50














            78












            78








            78







            How about the following idea:




            1. Create a priority queue



            2. Iterate through each file f


              1. enqueue the pair (nextNumberIn(f), f) using the first value as priority key





            3. While queue not empty


              1. dequeue head (m, f) of queue

              2. output m

              3. if f not depleted


                1. enqueue (nextNumberIn(f), f)






            Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.



            Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).






            share|improve this answer















            How about the following idea:




            1. Create a priority queue



            2. Iterate through each file f


              1. enqueue the pair (nextNumberIn(f), f) using the first value as priority key





            3. While queue not empty


              1. dequeue head (m, f) of queue

              2. output m

              3. if f not depleted


                1. enqueue (nextNumberIn(f), f)






            Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.



            Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Sep 22 '15 at 17:43

























            answered Feb 20 '11 at 8:03









            aioobeaioobe

            325k76691748




            325k76691748













            • Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

              – bits
              Feb 20 '11 at 8:45






            • 1





              @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

              – Programmer
              Jul 31 '11 at 1:40








            • 7





              "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

              – aioobe
              Jul 31 '11 at 6:38








            • 2





              @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

              – enobayram
              Jul 6 '13 at 7:25








            • 1





              @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

              – aioobe
              May 21 '17 at 13:50



















            • Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

              – bits
              Feb 20 '11 at 8:45






            • 1





              @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

              – Programmer
              Jul 31 '11 at 1:40








            • 7





              "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

              – aioobe
              Jul 31 '11 at 6:38








            • 2





              @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

              – enobayram
              Jul 6 '13 at 7:25








            • 1





              @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

              – aioobe
              May 21 '17 at 13:50

















            Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

            – bits
            Feb 20 '11 at 8:45





            Very neat. I was thinking on similar lines, except the pairing of number with file didn't occur to me. Thanks. Accepting.

            – bits
            Feb 20 '11 at 8:45




            1




            1





            @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

            – Programmer
            Jul 31 '11 at 1:40







            @aioobe: Your soulution is neat but you are making a lot of read calls which can hurt efficiency. For example, in item 3, for every while iteration, you make a read call for the next element in f . I think it will be better if you change the if condition to follows: if f not present in queue and f not depleted. This way you will make a read only if no element of f is present in the queue. Moreover, when you do make this read, you can read a chunk of numbers in f at once.

            – Programmer
            Jul 31 '11 at 1:40






            7




            7





            "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

            – aioobe
            Jul 31 '11 at 6:38







            "if f not present in queue", it's never present after dequeuing since there's always at most one value from each file present in the queue. Regarding your second comment: Your suggestion doesn't improve the complexity (except that it makes the answer more complex to read!) Besides, keep in mind that this is pseudo-code. It can very well be implemented with some buffered reader which reads larger chunks and caches them.

            – aioobe
            Jul 31 '11 at 6:38






            2




            2





            @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

            – enobayram
            Jul 6 '13 at 7:25







            @Programmer, I think you have a good point regarding number of reads, but you don't really have to implement "if f not present in queue"; you can simply buffer f and use aioobe's algorithm as is, reading the values of f through the buffer.

            – enobayram
            Jul 6 '13 at 7:25






            1




            1





            @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

            – aioobe
            May 21 '17 at 13:50





            @RestlessC0bra, no, step two inserts the first number in each file. In step 3 (the while loop) a value is dequeued and the next value from that file is enqueued (unless that file is depleted). Still unclear?

            – aioobe
            May 21 '17 at 13:50













            12














            Search for "Polyphase merge", check out classics - Donald Knuth & E.H.Friend.



            Also, you may want to take a look at the proposed Smart Block Merging by Seyedafsari & Hasanzadeh, that, similarly to earlier suggestions, uses priority queues.



            Another interesting reasonsing is In Place Merging Algorithm by Kim & Kutzner.



            I also recommend this paper by Vitter: External memory algorithms and data structures: dealing with massive data.






            share|improve this answer





















            • 2





              Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

              – Jeremy Salwen
              May 29 '12 at 5:58











            • Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

              – Grigori Melnik
              May 29 '12 at 7:17








            • 1





              This is the new link: waset.org/publications/9638/…

              – Hengameh
              Jun 20 '15 at 1:56
















            12














            Search for "Polyphase merge", check out classics - Donald Knuth & E.H.Friend.



            Also, you may want to take a look at the proposed Smart Block Merging by Seyedafsari & Hasanzadeh, that, similarly to earlier suggestions, uses priority queues.



            Another interesting reasonsing is In Place Merging Algorithm by Kim & Kutzner.



            I also recommend this paper by Vitter: External memory algorithms and data structures: dealing with massive data.






            share|improve this answer





















            • 2





              Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

              – Jeremy Salwen
              May 29 '12 at 5:58











            • Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

              – Grigori Melnik
              May 29 '12 at 7:17








            • 1





              This is the new link: waset.org/publications/9638/…

              – Hengameh
              Jun 20 '15 at 1:56














            12












            12








            12







            Search for "Polyphase merge", check out classics - Donald Knuth & E.H.Friend.



            Also, you may want to take a look at the proposed Smart Block Merging by Seyedafsari & Hasanzadeh, that, similarly to earlier suggestions, uses priority queues.



            Another interesting reasonsing is In Place Merging Algorithm by Kim & Kutzner.



            I also recommend this paper by Vitter: External memory algorithms and data structures: dealing with massive data.






            share|improve this answer















            Search for "Polyphase merge", check out classics - Donald Knuth & E.H.Friend.



            Also, you may want to take a look at the proposed Smart Block Merging by Seyedafsari & Hasanzadeh, that, similarly to earlier suggestions, uses priority queues.



            Another interesting reasonsing is In Place Merging Algorithm by Kim & Kutzner.



            I also recommend this paper by Vitter: External memory algorithms and data structures: dealing with massive data.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited May 29 '12 at 7:16

























            answered Feb 20 '11 at 8:44









            Grigori MelnikGrigori Melnik

            3,68622637




            3,68622637








            • 2





              Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

              – Jeremy Salwen
              May 29 '12 at 5:58











            • Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

              – Grigori Melnik
              May 29 '12 at 7:17








            • 1





              This is the new link: waset.org/publications/9638/…

              – Hengameh
              Jun 20 '15 at 1:56














            • 2





              Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

              – Jeremy Salwen
              May 29 '12 at 5:58











            • Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

              – Grigori Melnik
              May 29 '12 at 7:17








            • 1





              This is the new link: waset.org/publications/9638/…

              – Hengameh
              Jun 20 '15 at 1:56








            2




            2





            Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

            – Jeremy Salwen
            May 29 '12 at 5:58





            Your Smart Block Merging link is wrong. It's some article about supply chains in Estonia.

            – Jeremy Salwen
            May 29 '12 at 5:58













            Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

            – Grigori Melnik
            May 29 '12 at 7:17







            Jeremy, thanks for pointing it out. In 2012, the waset.org host seems to have overriden these files (originally, published in 2010) with new conference proceedings. I'm unable to locate the old article. If anybody has it, please post/link.

            – Grigori Melnik
            May 29 '12 at 7:17






            1




            1





            This is the new link: waset.org/publications/9638/…

            – Hengameh
            Jun 20 '15 at 1:56





            This is the new link: waset.org/publications/9638/…

            – Hengameh
            Jun 20 '15 at 1:56











            6














            One simple idea is to keep a priority queue of the ranges to merge, stored in such a way that the range with the smallest first element is removed first from the queue. You can then do an N-way merge as follows:




            1. Insert all of the ranges into the priority queue, excluding empty ranges.

            2. While the priority queue is not empty:

              1. Dequeue the smallest element from the queue.

              2. Append the first element of this range to the output sequence.

              3. If it's nonempty, insert the rest of the sequence back into the priority queue.




            The correctness of this algorithm is essentially a generalization of the proof that a 2-way merge works correctly - if you always add the smallest element from any range, and all the ranges are sorted, you end up with the sequence as a whole sorted.



            The runtime complexity of this algorithm can be found as follows. Let M be the total number of elements in all the sequences. If we use a binary heap, then we do at most O(M) insertions and O(M) deletions from the priority queue, since for each element written to the output sequence there's a dequeue to pull out the smallest sequence, followed by an enqueue to put the rest of the sequence back into the queue. Each of these steps takes O(lg N) operations, because insertion or deletion from a binary heap with N elements in it takes O(lg N) time. This gives a net runtime of O(M lg N), which grows less than linearly with the number of input sequences.



            There may be a way to get this even faster, but this seems like a pretty good solution. The memory usage is O(N) because we need O(N) overhead for the binary heap. If we implement the binary heap by storing pointers to the sequences rather than the sequences themselves, this shouldn't be too much of a problem unless you have a truly ridiculous number of sequences to merge. In that case, just merge them in groups that do fit into memory, then merge all the results.



            Hope this helps!






            share|improve this answer




























              6














              One simple idea is to keep a priority queue of the ranges to merge, stored in such a way that the range with the smallest first element is removed first from the queue. You can then do an N-way merge as follows:




              1. Insert all of the ranges into the priority queue, excluding empty ranges.

              2. While the priority queue is not empty:

                1. Dequeue the smallest element from the queue.

                2. Append the first element of this range to the output sequence.

                3. If it's nonempty, insert the rest of the sequence back into the priority queue.




              The correctness of this algorithm is essentially a generalization of the proof that a 2-way merge works correctly - if you always add the smallest element from any range, and all the ranges are sorted, you end up with the sequence as a whole sorted.



              The runtime complexity of this algorithm can be found as follows. Let M be the total number of elements in all the sequences. If we use a binary heap, then we do at most O(M) insertions and O(M) deletions from the priority queue, since for each element written to the output sequence there's a dequeue to pull out the smallest sequence, followed by an enqueue to put the rest of the sequence back into the queue. Each of these steps takes O(lg N) operations, because insertion or deletion from a binary heap with N elements in it takes O(lg N) time. This gives a net runtime of O(M lg N), which grows less than linearly with the number of input sequences.



              There may be a way to get this even faster, but this seems like a pretty good solution. The memory usage is O(N) because we need O(N) overhead for the binary heap. If we implement the binary heap by storing pointers to the sequences rather than the sequences themselves, this shouldn't be too much of a problem unless you have a truly ridiculous number of sequences to merge. In that case, just merge them in groups that do fit into memory, then merge all the results.



              Hope this helps!






              share|improve this answer


























                6












                6








                6







                One simple idea is to keep a priority queue of the ranges to merge, stored in such a way that the range with the smallest first element is removed first from the queue. You can then do an N-way merge as follows:




                1. Insert all of the ranges into the priority queue, excluding empty ranges.

                2. While the priority queue is not empty:

                  1. Dequeue the smallest element from the queue.

                  2. Append the first element of this range to the output sequence.

                  3. If it's nonempty, insert the rest of the sequence back into the priority queue.




                The correctness of this algorithm is essentially a generalization of the proof that a 2-way merge works correctly - if you always add the smallest element from any range, and all the ranges are sorted, you end up with the sequence as a whole sorted.



                The runtime complexity of this algorithm can be found as follows. Let M be the total number of elements in all the sequences. If we use a binary heap, then we do at most O(M) insertions and O(M) deletions from the priority queue, since for each element written to the output sequence there's a dequeue to pull out the smallest sequence, followed by an enqueue to put the rest of the sequence back into the queue. Each of these steps takes O(lg N) operations, because insertion or deletion from a binary heap with N elements in it takes O(lg N) time. This gives a net runtime of O(M lg N), which grows less than linearly with the number of input sequences.



                There may be a way to get this even faster, but this seems like a pretty good solution. The memory usage is O(N) because we need O(N) overhead for the binary heap. If we implement the binary heap by storing pointers to the sequences rather than the sequences themselves, this shouldn't be too much of a problem unless you have a truly ridiculous number of sequences to merge. In that case, just merge them in groups that do fit into memory, then merge all the results.



                Hope this helps!






                share|improve this answer













                One simple idea is to keep a priority queue of the ranges to merge, stored in such a way that the range with the smallest first element is removed first from the queue. You can then do an N-way merge as follows:




                1. Insert all of the ranges into the priority queue, excluding empty ranges.

                2. While the priority queue is not empty:

                  1. Dequeue the smallest element from the queue.

                  2. Append the first element of this range to the output sequence.

                  3. If it's nonempty, insert the rest of the sequence back into the priority queue.




                The correctness of this algorithm is essentially a generalization of the proof that a 2-way merge works correctly - if you always add the smallest element from any range, and all the ranges are sorted, you end up with the sequence as a whole sorted.



                The runtime complexity of this algorithm can be found as follows. Let M be the total number of elements in all the sequences. If we use a binary heap, then we do at most O(M) insertions and O(M) deletions from the priority queue, since for each element written to the output sequence there's a dequeue to pull out the smallest sequence, followed by an enqueue to put the rest of the sequence back into the queue. Each of these steps takes O(lg N) operations, because insertion or deletion from a binary heap with N elements in it takes O(lg N) time. This gives a net runtime of O(M lg N), which grows less than linearly with the number of input sequences.



                There may be a way to get this even faster, but this seems like a pretty good solution. The memory usage is O(N) because we need O(N) overhead for the binary heap. If we implement the binary heap by storing pointers to the sequences rather than the sequences themselves, this shouldn't be too much of a problem unless you have a truly ridiculous number of sequences to merge. In that case, just merge them in groups that do fit into memory, then merge all the results.



                Hope this helps!







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Feb 20 '11 at 8:02









                templatetypedeftemplatetypedef

                263k69667893




                263k69667893























                    2














                    A simple approach to Merging k sorted arrays (each of length n) requires O(n k^2) time and not O(nk) time. As when you merge first 2 arrays it takes 2n time, then when you merge third with the output , it takes 3n time as now we are merging two array of length 2n and n. Now when we merge this output with the fourth one,this merge requires 4n time.Thus the last merge (when we are adding the kth array to our already sorted array ) requires k*n time.Thus total time required is 2n+ 3n + 4n +...k*n which is O(n k^2).



                    It looks like we can do it in O(kn) time but it is not so because each time our array which we are merging is increasing in size.

                    Though we can achieve a better bound using divide and conquer. I am still working on that and post a solution if I find one.






                    share|improve this answer




























                      2














                      A simple approach to Merging k sorted arrays (each of length n) requires O(n k^2) time and not O(nk) time. As when you merge first 2 arrays it takes 2n time, then when you merge third with the output , it takes 3n time as now we are merging two array of length 2n and n. Now when we merge this output with the fourth one,this merge requires 4n time.Thus the last merge (when we are adding the kth array to our already sorted array ) requires k*n time.Thus total time required is 2n+ 3n + 4n +...k*n which is O(n k^2).



                      It looks like we can do it in O(kn) time but it is not so because each time our array which we are merging is increasing in size.

                      Though we can achieve a better bound using divide and conquer. I am still working on that and post a solution if I find one.






                      share|improve this answer


























                        2












                        2








                        2







                        A simple approach to Merging k sorted arrays (each of length n) requires O(n k^2) time and not O(nk) time. As when you merge first 2 arrays it takes 2n time, then when you merge third with the output , it takes 3n time as now we are merging two array of length 2n and n. Now when we merge this output with the fourth one,this merge requires 4n time.Thus the last merge (when we are adding the kth array to our already sorted array ) requires k*n time.Thus total time required is 2n+ 3n + 4n +...k*n which is O(n k^2).



                        It looks like we can do it in O(kn) time but it is not so because each time our array which we are merging is increasing in size.

                        Though we can achieve a better bound using divide and conquer. I am still working on that and post a solution if I find one.






                        share|improve this answer













                        A simple approach to Merging k sorted arrays (each of length n) requires O(n k^2) time and not O(nk) time. As when you merge first 2 arrays it takes 2n time, then when you merge third with the output , it takes 3n time as now we are merging two array of length 2n and n. Now when we merge this output with the fourth one,this merge requires 4n time.Thus the last merge (when we are adding the kth array to our already sorted array ) requires k*n time.Thus total time required is 2n+ 3n + 4n +...k*n which is O(n k^2).



                        It looks like we can do it in O(kn) time but it is not so because each time our array which we are merging is increasing in size.

                        Though we can achieve a better bound using divide and conquer. I am still working on that and post a solution if I find one.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Dec 16 '12 at 19:26









                        ashish_ashish_

                        663




                        663























                            1














                            See http://en.wikipedia.org/wiki/External_sorting. Here is my take on the heap based k-way merge, using a buffered read from the sources to emulate I/O reduction:



                            public class KWayMerger<T>
                            {
                            private readonly IList<T> _sources;
                            private readonly int _bufferSize;
                            private readonly MinHeap<MergeValue<T>> _mergeHeap;
                            private readonly int _indices;

                            public KWayMerger(IList<T> sources, int bufferSize, Comparer<T> comparer = null)
                            {
                            if (sources == null) throw new ArgumentNullException("sources");

                            _sources = sources;
                            _bufferSize = bufferSize;

                            _mergeHeap = new MinHeap<MergeValue<T>>(
                            new MergeComparer<T>(comparer ?? Comparer<T>.Default));
                            _indices = new int[sources.Count];
                            }

                            public T Merge()
                            {
                            for (int i = 0; i <= _sources.Count - 1; i++)
                            AddToMergeHeap(i);

                            var merged = new T[_sources.Sum(s => s.Length)];
                            int mergeIndex = 0;

                            while (_mergeHeap.Count > 0)
                            {
                            var min = _mergeHeap.ExtractDominating();
                            merged[mergeIndex++] = min.Value;
                            if (min.Source != -1) //the last item of the source was extracted
                            AddToMergeHeap(min.Source);
                            }

                            return merged;
                            }

                            private void AddToMergeHeap(int sourceIndex)
                            {
                            var source = _sources[sourceIndex];
                            var start = _indices[sourceIndex];
                            var end = Math.Min(start + _bufferSize - 1, source.Length - 1);

                            if (start > source.Length - 1)
                            return; //we're done with this source

                            for (int i = start; i <= end - 1; i++)
                            _mergeHeap.Add(new MergeValue<T>(-1, source[i]));

                            //only the last item should trigger the next buffered read
                            _mergeHeap.Add(new MergeValue<T>(sourceIndex, source[end]));

                            _indices[sourceIndex] += _bufferSize; //we may have added less items,
                            //but if we did we've reached the end of the source so it doesn't matter
                            }
                            }

                            internal class MergeValue<T>
                            {
                            public int Source { get; private set; }
                            public T Value { get; private set; }

                            public MergeValue(int source, T value)
                            {
                            Value = value;
                            Source = source;
                            }
                            }

                            internal class MergeComparer<T> : IComparer<MergeValue<T>>
                            {
                            public Comparer<T> Comparer { get; private set; }

                            public MergeComparer(Comparer<T> comparer)
                            {
                            if (comparer == null) throw new ArgumentNullException("comparer");
                            Comparer = comparer;
                            }

                            public int Compare(MergeValue<T> x, MergeValue<T> y)
                            {
                            Debug.Assert(x != null && y != null);
                            return Comparer.Compare(x.Value, y.Value);
                            }
                            }


                            Here is one possible implementation of MinHeap<T>. Some tests:



                            [TestMethod]
                            public void TestKWaySort()
                            {
                            var rand = new Random();
                            for (int i = 0; i < 10; i++)
                            AssertKwayMerge(rand);
                            }

                            private static void AssertKwayMerge(Random rand)
                            {
                            var sources = new
                            {
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            };
                            Assert.IsTrue(new KWayMerger<int>(sources, 20).Merge().SequenceEqual(sources.SelectMany(s => s).OrderBy(i => i)));
                            }

                            public static IEnumerable<int> GenerateRandomCollection(Random rand, int minLength, int maxLength, int min = 0, int max = int.MaxValue)
                            {
                            return Enumerable.Repeat(0, rand.Next(minLength, maxLength)).Select(i => rand.Next(min, max));
                            }





                            share|improve this answer


























                            • What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

                              – Hengameh
                              Jun 20 '15 at 2:02






                            • 1





                              @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

                              – Ohad Schneider
                              Jun 20 '15 at 7:11
















                            1














                            See http://en.wikipedia.org/wiki/External_sorting. Here is my take on the heap based k-way merge, using a buffered read from the sources to emulate I/O reduction:



                            public class KWayMerger<T>
                            {
                            private readonly IList<T> _sources;
                            private readonly int _bufferSize;
                            private readonly MinHeap<MergeValue<T>> _mergeHeap;
                            private readonly int _indices;

                            public KWayMerger(IList<T> sources, int bufferSize, Comparer<T> comparer = null)
                            {
                            if (sources == null) throw new ArgumentNullException("sources");

                            _sources = sources;
                            _bufferSize = bufferSize;

                            _mergeHeap = new MinHeap<MergeValue<T>>(
                            new MergeComparer<T>(comparer ?? Comparer<T>.Default));
                            _indices = new int[sources.Count];
                            }

                            public T Merge()
                            {
                            for (int i = 0; i <= _sources.Count - 1; i++)
                            AddToMergeHeap(i);

                            var merged = new T[_sources.Sum(s => s.Length)];
                            int mergeIndex = 0;

                            while (_mergeHeap.Count > 0)
                            {
                            var min = _mergeHeap.ExtractDominating();
                            merged[mergeIndex++] = min.Value;
                            if (min.Source != -1) //the last item of the source was extracted
                            AddToMergeHeap(min.Source);
                            }

                            return merged;
                            }

                            private void AddToMergeHeap(int sourceIndex)
                            {
                            var source = _sources[sourceIndex];
                            var start = _indices[sourceIndex];
                            var end = Math.Min(start + _bufferSize - 1, source.Length - 1);

                            if (start > source.Length - 1)
                            return; //we're done with this source

                            for (int i = start; i <= end - 1; i++)
                            _mergeHeap.Add(new MergeValue<T>(-1, source[i]));

                            //only the last item should trigger the next buffered read
                            _mergeHeap.Add(new MergeValue<T>(sourceIndex, source[end]));

                            _indices[sourceIndex] += _bufferSize; //we may have added less items,
                            //but if we did we've reached the end of the source so it doesn't matter
                            }
                            }

                            internal class MergeValue<T>
                            {
                            public int Source { get; private set; }
                            public T Value { get; private set; }

                            public MergeValue(int source, T value)
                            {
                            Value = value;
                            Source = source;
                            }
                            }

                            internal class MergeComparer<T> : IComparer<MergeValue<T>>
                            {
                            public Comparer<T> Comparer { get; private set; }

                            public MergeComparer(Comparer<T> comparer)
                            {
                            if (comparer == null) throw new ArgumentNullException("comparer");
                            Comparer = comparer;
                            }

                            public int Compare(MergeValue<T> x, MergeValue<T> y)
                            {
                            Debug.Assert(x != null && y != null);
                            return Comparer.Compare(x.Value, y.Value);
                            }
                            }


                            Here is one possible implementation of MinHeap<T>. Some tests:



                            [TestMethod]
                            public void TestKWaySort()
                            {
                            var rand = new Random();
                            for (int i = 0; i < 10; i++)
                            AssertKwayMerge(rand);
                            }

                            private static void AssertKwayMerge(Random rand)
                            {
                            var sources = new
                            {
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            };
                            Assert.IsTrue(new KWayMerger<int>(sources, 20).Merge().SequenceEqual(sources.SelectMany(s => s).OrderBy(i => i)));
                            }

                            public static IEnumerable<int> GenerateRandomCollection(Random rand, int minLength, int maxLength, int min = 0, int max = int.MaxValue)
                            {
                            return Enumerable.Repeat(0, rand.Next(minLength, maxLength)).Select(i => rand.Next(min, max));
                            }





                            share|improve this answer


























                            • What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

                              – Hengameh
                              Jun 20 '15 at 2:02






                            • 1





                              @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

                              – Ohad Schneider
                              Jun 20 '15 at 7:11














                            1












                            1








                            1







                            See http://en.wikipedia.org/wiki/External_sorting. Here is my take on the heap based k-way merge, using a buffered read from the sources to emulate I/O reduction:



                            public class KWayMerger<T>
                            {
                            private readonly IList<T> _sources;
                            private readonly int _bufferSize;
                            private readonly MinHeap<MergeValue<T>> _mergeHeap;
                            private readonly int _indices;

                            public KWayMerger(IList<T> sources, int bufferSize, Comparer<T> comparer = null)
                            {
                            if (sources == null) throw new ArgumentNullException("sources");

                            _sources = sources;
                            _bufferSize = bufferSize;

                            _mergeHeap = new MinHeap<MergeValue<T>>(
                            new MergeComparer<T>(comparer ?? Comparer<T>.Default));
                            _indices = new int[sources.Count];
                            }

                            public T Merge()
                            {
                            for (int i = 0; i <= _sources.Count - 1; i++)
                            AddToMergeHeap(i);

                            var merged = new T[_sources.Sum(s => s.Length)];
                            int mergeIndex = 0;

                            while (_mergeHeap.Count > 0)
                            {
                            var min = _mergeHeap.ExtractDominating();
                            merged[mergeIndex++] = min.Value;
                            if (min.Source != -1) //the last item of the source was extracted
                            AddToMergeHeap(min.Source);
                            }

                            return merged;
                            }

                            private void AddToMergeHeap(int sourceIndex)
                            {
                            var source = _sources[sourceIndex];
                            var start = _indices[sourceIndex];
                            var end = Math.Min(start + _bufferSize - 1, source.Length - 1);

                            if (start > source.Length - 1)
                            return; //we're done with this source

                            for (int i = start; i <= end - 1; i++)
                            _mergeHeap.Add(new MergeValue<T>(-1, source[i]));

                            //only the last item should trigger the next buffered read
                            _mergeHeap.Add(new MergeValue<T>(sourceIndex, source[end]));

                            _indices[sourceIndex] += _bufferSize; //we may have added less items,
                            //but if we did we've reached the end of the source so it doesn't matter
                            }
                            }

                            internal class MergeValue<T>
                            {
                            public int Source { get; private set; }
                            public T Value { get; private set; }

                            public MergeValue(int source, T value)
                            {
                            Value = value;
                            Source = source;
                            }
                            }

                            internal class MergeComparer<T> : IComparer<MergeValue<T>>
                            {
                            public Comparer<T> Comparer { get; private set; }

                            public MergeComparer(Comparer<T> comparer)
                            {
                            if (comparer == null) throw new ArgumentNullException("comparer");
                            Comparer = comparer;
                            }

                            public int Compare(MergeValue<T> x, MergeValue<T> y)
                            {
                            Debug.Assert(x != null && y != null);
                            return Comparer.Compare(x.Value, y.Value);
                            }
                            }


                            Here is one possible implementation of MinHeap<T>. Some tests:



                            [TestMethod]
                            public void TestKWaySort()
                            {
                            var rand = new Random();
                            for (int i = 0; i < 10; i++)
                            AssertKwayMerge(rand);
                            }

                            private static void AssertKwayMerge(Random rand)
                            {
                            var sources = new
                            {
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            };
                            Assert.IsTrue(new KWayMerger<int>(sources, 20).Merge().SequenceEqual(sources.SelectMany(s => s).OrderBy(i => i)));
                            }

                            public static IEnumerable<int> GenerateRandomCollection(Random rand, int minLength, int maxLength, int min = 0, int max = int.MaxValue)
                            {
                            return Enumerable.Repeat(0, rand.Next(minLength, maxLength)).Select(i => rand.Next(min, max));
                            }





                            share|improve this answer















                            See http://en.wikipedia.org/wiki/External_sorting. Here is my take on the heap based k-way merge, using a buffered read from the sources to emulate I/O reduction:



                            public class KWayMerger<T>
                            {
                            private readonly IList<T> _sources;
                            private readonly int _bufferSize;
                            private readonly MinHeap<MergeValue<T>> _mergeHeap;
                            private readonly int _indices;

                            public KWayMerger(IList<T> sources, int bufferSize, Comparer<T> comparer = null)
                            {
                            if (sources == null) throw new ArgumentNullException("sources");

                            _sources = sources;
                            _bufferSize = bufferSize;

                            _mergeHeap = new MinHeap<MergeValue<T>>(
                            new MergeComparer<T>(comparer ?? Comparer<T>.Default));
                            _indices = new int[sources.Count];
                            }

                            public T Merge()
                            {
                            for (int i = 0; i <= _sources.Count - 1; i++)
                            AddToMergeHeap(i);

                            var merged = new T[_sources.Sum(s => s.Length)];
                            int mergeIndex = 0;

                            while (_mergeHeap.Count > 0)
                            {
                            var min = _mergeHeap.ExtractDominating();
                            merged[mergeIndex++] = min.Value;
                            if (min.Source != -1) //the last item of the source was extracted
                            AddToMergeHeap(min.Source);
                            }

                            return merged;
                            }

                            private void AddToMergeHeap(int sourceIndex)
                            {
                            var source = _sources[sourceIndex];
                            var start = _indices[sourceIndex];
                            var end = Math.Min(start + _bufferSize - 1, source.Length - 1);

                            if (start > source.Length - 1)
                            return; //we're done with this source

                            for (int i = start; i <= end - 1; i++)
                            _mergeHeap.Add(new MergeValue<T>(-1, source[i]));

                            //only the last item should trigger the next buffered read
                            _mergeHeap.Add(new MergeValue<T>(sourceIndex, source[end]));

                            _indices[sourceIndex] += _bufferSize; //we may have added less items,
                            //but if we did we've reached the end of the source so it doesn't matter
                            }
                            }

                            internal class MergeValue<T>
                            {
                            public int Source { get; private set; }
                            public T Value { get; private set; }

                            public MergeValue(int source, T value)
                            {
                            Value = value;
                            Source = source;
                            }
                            }

                            internal class MergeComparer<T> : IComparer<MergeValue<T>>
                            {
                            public Comparer<T> Comparer { get; private set; }

                            public MergeComparer(Comparer<T> comparer)
                            {
                            if (comparer == null) throw new ArgumentNullException("comparer");
                            Comparer = comparer;
                            }

                            public int Compare(MergeValue<T> x, MergeValue<T> y)
                            {
                            Debug.Assert(x != null && y != null);
                            return Comparer.Compare(x.Value, y.Value);
                            }
                            }


                            Here is one possible implementation of MinHeap<T>. Some tests:



                            [TestMethod]
                            public void TestKWaySort()
                            {
                            var rand = new Random();
                            for (int i = 0; i < 10; i++)
                            AssertKwayMerge(rand);
                            }

                            private static void AssertKwayMerge(Random rand)
                            {
                            var sources = new
                            {
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            GenerateRandomCollection(rand, 10, 30, 0, 30).OrderBy(i => i).ToArray(),
                            };
                            Assert.IsTrue(new KWayMerger<int>(sources, 20).Merge().SequenceEqual(sources.SelectMany(s => s).OrderBy(i => i)));
                            }

                            public static IEnumerable<int> GenerateRandomCollection(Random rand, int minLength, int maxLength, int min = 0, int max = int.MaxValue)
                            {
                            return Enumerable.Repeat(0, rand.Next(minLength, maxLength)).Select(i => rand.Next(min, max));
                            }






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited May 23 '17 at 12:02









                            Community

                            11




                            11










                            answered Dec 8 '12 at 18:40









                            Ohad SchneiderOhad Schneider

                            26.3k10120157




                            26.3k10120157













                            • What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

                              – Hengameh
                              Jun 20 '15 at 2:02






                            • 1





                              @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

                              – Ohad Schneider
                              Jun 20 '15 at 7:11



















                            • What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

                              – Hengameh
                              Jun 20 '15 at 2:02






                            • 1





                              @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

                              – Ohad Schneider
                              Jun 20 '15 at 7:11

















                            What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

                            – Hengameh
                            Jun 20 '15 at 2:02





                            What is the language of your code? (Sorry, I am new in programming; I am looking for a Java solution).

                            – Hengameh
                            Jun 20 '15 at 2:02




                            1




                            1





                            @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

                            – Ohad Schneider
                            Jun 20 '15 at 7:11





                            @Hengameh it's C#. The syntax is very similar to Java so it shouldn't be too hard to translate it.

                            – Ohad Schneider
                            Jun 20 '15 at 7:11











                            1














                            I wrote this STL-style piece of code that does N-way merge and thought I'd post it here to help prevent others from reinventing the wheel. :)



                            Warning: it's only mildly tested. Test before use. :)



                            You can use it like this:



                            #include <vector>

                            int main()
                            {
                            std::vector<std::vector<int> > v;
                            std::vector<std::vector<int>::iterator> vout;
                            std::vector<int> v1;
                            std::vector<int> v2;
                            v1.push_back(1);
                            v1.push_back(2);
                            v1.push_back(3);
                            v2.push_back(0);
                            v2.push_back(1);
                            v2.push_back(2);
                            v.push_back(v1);
                            v.push_back(v2);
                            multiway_merge(v.begin(), v.end(), std::back_inserter(vout), false);
                            }


                            It also allows using pairs of iterators instead of the containers themselves.



                            If you use Boost.Range, you can remove some of the boilerplate code.



                            The code:



                            #include <algorithm>
                            #include <functional> // std::less
                            #include <iterator>
                            #include <queue> // std::priority_queue
                            #include <utility> // std::pair
                            #include <vector>

                            template<class OutIt>
                            struct multiway_merge_value_insert_iterator : public std::iterator<
                            std::output_iterator_tag, OutIt, ptrdiff_t
                            >
                            {
                            OutIt it;
                            multiway_merge_value_insert_iterator(OutIt const it = OutIt())
                            : it(it) { }

                            multiway_merge_value_insert_iterator &operator++(int)
                            { return *this; }

                            multiway_merge_value_insert_iterator &operator++()
                            { return *this; }

                            multiway_merge_value_insert_iterator &operator *()
                            { return *this; }

                            template<class It>
                            multiway_merge_value_insert_iterator &operator =(It const i)
                            {
                            *this->it = *i;
                            ++this->it;
                            return *this;
                            }
                            };

                            template<class OutIt>
                            multiway_merge_value_insert_iterator<OutIt>
                            multiway_merge_value_inserter(OutIt const it)
                            { return multiway_merge_value_insert_iterator<OutIt>(it); };

                            template<class Less>
                            struct multiway_merge_value_less : private Less
                            {
                            multiway_merge_value_less(Less const &less) : Less(less) { }
                            template<class It1, class It2>
                            bool operator()(
                            std::pair<It1, It1> const &b /* inverted */,
                            std::pair<It2, It2> const &a) const
                            {
                            return b.first != b.second && (
                            a.first == a.second ||
                            this->Less::operator()(*a.first, *b.first));
                            }
                            };

                            struct multiway_merge_default_less
                            {
                            template<class T>
                            bool operator()(T const &a, T const &b) const
                            { return std::less<T>()(a, b); }
                            };

                            template<class R>
                            struct multiway_merge_range_iterator
                            { typedef typename R::iterator type; };

                            template<class R>
                            struct multiway_merge_range_iterator<R const>
                            { typedef typename R::const_iterator type; };

                            template<class It>
                            struct multiway_merge_range_iterator<std::pair<It, It> >
                            { typedef It type; };

                            template<class R>
                            typename R::iterator multiway_merge_range_begin(R &r)
                            { return r.begin(); }

                            template<class R>
                            typename R::iterator multiway_merge_range_end(R &r)
                            { return r.end(); }

                            template<class R>
                            typename R::const_iterator multiway_merge_range_begin(R const &r)
                            { return r.begin(); }

                            template<class R>
                            typename R::const_iterator multiway_merge_range_end(R const &r)
                            { return r.end(); }

                            template<class It>
                            It multiway_merge_range_begin(std::pair<It, It> const &r)
                            { return r.first; }

                            template<class It>
                            It multiway_merge_range_end(std::pair<It, It> const &r)
                            { return r.second; }

                            template<class It, class OutIt, class Less, class PQ>
                            OutIt multiway_merge(
                            It begin, It const end, OutIt out, Less const &less,
                            PQ &pq, bool const distinct = false)
                            {
                            while (begin != end)
                            {
                            pq.push(typename PQ::value_type(
                            multiway_merge_range_begin(*begin),
                            multiway_merge_range_end(*begin)));
                            ++begin;
                            }
                            while (!pq.empty())
                            {
                            typename PQ::value_type top = pq.top();
                            pq.pop();
                            if (top.first != top.second)
                            {
                            while (!pq.empty() && pq.top().first == pq.top().second)
                            { pq.pop(); }
                            if (!distinct ||
                            pq.empty() ||
                            less(*pq.top().first, *top.first) ||
                            less(*top.first, *pq.top().first))
                            {
                            *out = top.first;
                            ++out;
                            }

                            ++top.first;
                            pq.push(top);
                            }
                            }
                            return out;
                            }

                            template<class It, class OutIt, class Less>
                            OutIt multiway_merge(
                            It const begin, It const end, OutIt out, Less const &less,
                            bool const distinct = false)
                            {
                            typedef typename multiway_merge_range_iterator<
                            typename std::iterator_traits<It>::value_type
                            >::type SubIt;
                            if (std::distance(begin, end) < 16)
                            {
                            typedef std::vector<std::pair<SubIt, SubIt> > Remaining;
                            Remaining remaining;
                            remaining.reserve(
                            static_cast<size_t>(std::distance(begin, end)));
                            for (It i = begin; i != end; ++i)
                            {
                            if (multiway_merge_range_begin(*i) !=
                            multiway_merge_range_end(*i))
                            {
                            remaining.push_back(std::make_pair(
                            multiway_merge_range_begin(*i),
                            multiway_merge_range_end(*i)));
                            }
                            }
                            while (!remaining.empty())
                            {
                            typename Remaining::iterator smallest =
                            remaining.begin();
                            for (typename Remaining::iterator
                            i = remaining.begin();
                            i != remaining.end();
                            )
                            {
                            if (less(*i->first, *smallest->first))
                            {
                            smallest = i;
                            ++i;
                            }
                            else if (distinct && i != smallest &&
                            !less(
                            *smallest->first,
                            *i->first))
                            {
                            i = remaining.erase(i);
                            }
                            else { ++i; }
                            }
                            *out = smallest->first;
                            ++out;
                            ++smallest->first;
                            if (smallest->first == smallest->second)
                            { smallest = remaining.erase(smallest); }
                            }
                            return out;
                            }
                            else
                            {
                            std::priority_queue<
                            std::pair<SubIt, SubIt>,
                            std::vector<std::pair<SubIt, SubIt> >,
                            multiway_merge_value_less<Less>
                            > q((multiway_merge_value_less<Less>(less)));
                            return multiway_merge(begin, end, out, less, q, distinct);
                            }
                            }

                            template<class It, class OutIt>
                            OutIt multiway_merge(
                            It const begin, It const end, OutIt const out,
                            bool const distinct = false)
                            {
                            return multiway_merge(
                            begin, end, out,
                            multiway_merge_default_less(), distinct);
                            }





                            share|improve this answer






























                              1














                              I wrote this STL-style piece of code that does N-way merge and thought I'd post it here to help prevent others from reinventing the wheel. :)



                              Warning: it's only mildly tested. Test before use. :)



                              You can use it like this:



                              #include <vector>

                              int main()
                              {
                              std::vector<std::vector<int> > v;
                              std::vector<std::vector<int>::iterator> vout;
                              std::vector<int> v1;
                              std::vector<int> v2;
                              v1.push_back(1);
                              v1.push_back(2);
                              v1.push_back(3);
                              v2.push_back(0);
                              v2.push_back(1);
                              v2.push_back(2);
                              v.push_back(v1);
                              v.push_back(v2);
                              multiway_merge(v.begin(), v.end(), std::back_inserter(vout), false);
                              }


                              It also allows using pairs of iterators instead of the containers themselves.



                              If you use Boost.Range, you can remove some of the boilerplate code.



                              The code:



                              #include <algorithm>
                              #include <functional> // std::less
                              #include <iterator>
                              #include <queue> // std::priority_queue
                              #include <utility> // std::pair
                              #include <vector>

                              template<class OutIt>
                              struct multiway_merge_value_insert_iterator : public std::iterator<
                              std::output_iterator_tag, OutIt, ptrdiff_t
                              >
                              {
                              OutIt it;
                              multiway_merge_value_insert_iterator(OutIt const it = OutIt())
                              : it(it) { }

                              multiway_merge_value_insert_iterator &operator++(int)
                              { return *this; }

                              multiway_merge_value_insert_iterator &operator++()
                              { return *this; }

                              multiway_merge_value_insert_iterator &operator *()
                              { return *this; }

                              template<class It>
                              multiway_merge_value_insert_iterator &operator =(It const i)
                              {
                              *this->it = *i;
                              ++this->it;
                              return *this;
                              }
                              };

                              template<class OutIt>
                              multiway_merge_value_insert_iterator<OutIt>
                              multiway_merge_value_inserter(OutIt const it)
                              { return multiway_merge_value_insert_iterator<OutIt>(it); };

                              template<class Less>
                              struct multiway_merge_value_less : private Less
                              {
                              multiway_merge_value_less(Less const &less) : Less(less) { }
                              template<class It1, class It2>
                              bool operator()(
                              std::pair<It1, It1> const &b /* inverted */,
                              std::pair<It2, It2> const &a) const
                              {
                              return b.first != b.second && (
                              a.first == a.second ||
                              this->Less::operator()(*a.first, *b.first));
                              }
                              };

                              struct multiway_merge_default_less
                              {
                              template<class T>
                              bool operator()(T const &a, T const &b) const
                              { return std::less<T>()(a, b); }
                              };

                              template<class R>
                              struct multiway_merge_range_iterator
                              { typedef typename R::iterator type; };

                              template<class R>
                              struct multiway_merge_range_iterator<R const>
                              { typedef typename R::const_iterator type; };

                              template<class It>
                              struct multiway_merge_range_iterator<std::pair<It, It> >
                              { typedef It type; };

                              template<class R>
                              typename R::iterator multiway_merge_range_begin(R &r)
                              { return r.begin(); }

                              template<class R>
                              typename R::iterator multiway_merge_range_end(R &r)
                              { return r.end(); }

                              template<class R>
                              typename R::const_iterator multiway_merge_range_begin(R const &r)
                              { return r.begin(); }

                              template<class R>
                              typename R::const_iterator multiway_merge_range_end(R const &r)
                              { return r.end(); }

                              template<class It>
                              It multiway_merge_range_begin(std::pair<It, It> const &r)
                              { return r.first; }

                              template<class It>
                              It multiway_merge_range_end(std::pair<It, It> const &r)
                              { return r.second; }

                              template<class It, class OutIt, class Less, class PQ>
                              OutIt multiway_merge(
                              It begin, It const end, OutIt out, Less const &less,
                              PQ &pq, bool const distinct = false)
                              {
                              while (begin != end)
                              {
                              pq.push(typename PQ::value_type(
                              multiway_merge_range_begin(*begin),
                              multiway_merge_range_end(*begin)));
                              ++begin;
                              }
                              while (!pq.empty())
                              {
                              typename PQ::value_type top = pq.top();
                              pq.pop();
                              if (top.first != top.second)
                              {
                              while (!pq.empty() && pq.top().first == pq.top().second)
                              { pq.pop(); }
                              if (!distinct ||
                              pq.empty() ||
                              less(*pq.top().first, *top.first) ||
                              less(*top.first, *pq.top().first))
                              {
                              *out = top.first;
                              ++out;
                              }

                              ++top.first;
                              pq.push(top);
                              }
                              }
                              return out;
                              }

                              template<class It, class OutIt, class Less>
                              OutIt multiway_merge(
                              It const begin, It const end, OutIt out, Less const &less,
                              bool const distinct = false)
                              {
                              typedef typename multiway_merge_range_iterator<
                              typename std::iterator_traits<It>::value_type
                              >::type SubIt;
                              if (std::distance(begin, end) < 16)
                              {
                              typedef std::vector<std::pair<SubIt, SubIt> > Remaining;
                              Remaining remaining;
                              remaining.reserve(
                              static_cast<size_t>(std::distance(begin, end)));
                              for (It i = begin; i != end; ++i)
                              {
                              if (multiway_merge_range_begin(*i) !=
                              multiway_merge_range_end(*i))
                              {
                              remaining.push_back(std::make_pair(
                              multiway_merge_range_begin(*i),
                              multiway_merge_range_end(*i)));
                              }
                              }
                              while (!remaining.empty())
                              {
                              typename Remaining::iterator smallest =
                              remaining.begin();
                              for (typename Remaining::iterator
                              i = remaining.begin();
                              i != remaining.end();
                              )
                              {
                              if (less(*i->first, *smallest->first))
                              {
                              smallest = i;
                              ++i;
                              }
                              else if (distinct && i != smallest &&
                              !less(
                              *smallest->first,
                              *i->first))
                              {
                              i = remaining.erase(i);
                              }
                              else { ++i; }
                              }
                              *out = smallest->first;
                              ++out;
                              ++smallest->first;
                              if (smallest->first == smallest->second)
                              { smallest = remaining.erase(smallest); }
                              }
                              return out;
                              }
                              else
                              {
                              std::priority_queue<
                              std::pair<SubIt, SubIt>,
                              std::vector<std::pair<SubIt, SubIt> >,
                              multiway_merge_value_less<Less>
                              > q((multiway_merge_value_less<Less>(less)));
                              return multiway_merge(begin, end, out, less, q, distinct);
                              }
                              }

                              template<class It, class OutIt>
                              OutIt multiway_merge(
                              It const begin, It const end, OutIt const out,
                              bool const distinct = false)
                              {
                              return multiway_merge(
                              begin, end, out,
                              multiway_merge_default_less(), distinct);
                              }





                              share|improve this answer




























                                1












                                1








                                1







                                I wrote this STL-style piece of code that does N-way merge and thought I'd post it here to help prevent others from reinventing the wheel. :)



                                Warning: it's only mildly tested. Test before use. :)



                                You can use it like this:



                                #include <vector>

                                int main()
                                {
                                std::vector<std::vector<int> > v;
                                std::vector<std::vector<int>::iterator> vout;
                                std::vector<int> v1;
                                std::vector<int> v2;
                                v1.push_back(1);
                                v1.push_back(2);
                                v1.push_back(3);
                                v2.push_back(0);
                                v2.push_back(1);
                                v2.push_back(2);
                                v.push_back(v1);
                                v.push_back(v2);
                                multiway_merge(v.begin(), v.end(), std::back_inserter(vout), false);
                                }


                                It also allows using pairs of iterators instead of the containers themselves.



                                If you use Boost.Range, you can remove some of the boilerplate code.



                                The code:



                                #include <algorithm>
                                #include <functional> // std::less
                                #include <iterator>
                                #include <queue> // std::priority_queue
                                #include <utility> // std::pair
                                #include <vector>

                                template<class OutIt>
                                struct multiway_merge_value_insert_iterator : public std::iterator<
                                std::output_iterator_tag, OutIt, ptrdiff_t
                                >
                                {
                                OutIt it;
                                multiway_merge_value_insert_iterator(OutIt const it = OutIt())
                                : it(it) { }

                                multiway_merge_value_insert_iterator &operator++(int)
                                { return *this; }

                                multiway_merge_value_insert_iterator &operator++()
                                { return *this; }

                                multiway_merge_value_insert_iterator &operator *()
                                { return *this; }

                                template<class It>
                                multiway_merge_value_insert_iterator &operator =(It const i)
                                {
                                *this->it = *i;
                                ++this->it;
                                return *this;
                                }
                                };

                                template<class OutIt>
                                multiway_merge_value_insert_iterator<OutIt>
                                multiway_merge_value_inserter(OutIt const it)
                                { return multiway_merge_value_insert_iterator<OutIt>(it); };

                                template<class Less>
                                struct multiway_merge_value_less : private Less
                                {
                                multiway_merge_value_less(Less const &less) : Less(less) { }
                                template<class It1, class It2>
                                bool operator()(
                                std::pair<It1, It1> const &b /* inverted */,
                                std::pair<It2, It2> const &a) const
                                {
                                return b.first != b.second && (
                                a.first == a.second ||
                                this->Less::operator()(*a.first, *b.first));
                                }
                                };

                                struct multiway_merge_default_less
                                {
                                template<class T>
                                bool operator()(T const &a, T const &b) const
                                { return std::less<T>()(a, b); }
                                };

                                template<class R>
                                struct multiway_merge_range_iterator
                                { typedef typename R::iterator type; };

                                template<class R>
                                struct multiway_merge_range_iterator<R const>
                                { typedef typename R::const_iterator type; };

                                template<class It>
                                struct multiway_merge_range_iterator<std::pair<It, It> >
                                { typedef It type; };

                                template<class R>
                                typename R::iterator multiway_merge_range_begin(R &r)
                                { return r.begin(); }

                                template<class R>
                                typename R::iterator multiway_merge_range_end(R &r)
                                { return r.end(); }

                                template<class R>
                                typename R::const_iterator multiway_merge_range_begin(R const &r)
                                { return r.begin(); }

                                template<class R>
                                typename R::const_iterator multiway_merge_range_end(R const &r)
                                { return r.end(); }

                                template<class It>
                                It multiway_merge_range_begin(std::pair<It, It> const &r)
                                { return r.first; }

                                template<class It>
                                It multiway_merge_range_end(std::pair<It, It> const &r)
                                { return r.second; }

                                template<class It, class OutIt, class Less, class PQ>
                                OutIt multiway_merge(
                                It begin, It const end, OutIt out, Less const &less,
                                PQ &pq, bool const distinct = false)
                                {
                                while (begin != end)
                                {
                                pq.push(typename PQ::value_type(
                                multiway_merge_range_begin(*begin),
                                multiway_merge_range_end(*begin)));
                                ++begin;
                                }
                                while (!pq.empty())
                                {
                                typename PQ::value_type top = pq.top();
                                pq.pop();
                                if (top.first != top.second)
                                {
                                while (!pq.empty() && pq.top().first == pq.top().second)
                                { pq.pop(); }
                                if (!distinct ||
                                pq.empty() ||
                                less(*pq.top().first, *top.first) ||
                                less(*top.first, *pq.top().first))
                                {
                                *out = top.first;
                                ++out;
                                }

                                ++top.first;
                                pq.push(top);
                                }
                                }
                                return out;
                                }

                                template<class It, class OutIt, class Less>
                                OutIt multiway_merge(
                                It const begin, It const end, OutIt out, Less const &less,
                                bool const distinct = false)
                                {
                                typedef typename multiway_merge_range_iterator<
                                typename std::iterator_traits<It>::value_type
                                >::type SubIt;
                                if (std::distance(begin, end) < 16)
                                {
                                typedef std::vector<std::pair<SubIt, SubIt> > Remaining;
                                Remaining remaining;
                                remaining.reserve(
                                static_cast<size_t>(std::distance(begin, end)));
                                for (It i = begin; i != end; ++i)
                                {
                                if (multiway_merge_range_begin(*i) !=
                                multiway_merge_range_end(*i))
                                {
                                remaining.push_back(std::make_pair(
                                multiway_merge_range_begin(*i),
                                multiway_merge_range_end(*i)));
                                }
                                }
                                while (!remaining.empty())
                                {
                                typename Remaining::iterator smallest =
                                remaining.begin();
                                for (typename Remaining::iterator
                                i = remaining.begin();
                                i != remaining.end();
                                )
                                {
                                if (less(*i->first, *smallest->first))
                                {
                                smallest = i;
                                ++i;
                                }
                                else if (distinct && i != smallest &&
                                !less(
                                *smallest->first,
                                *i->first))
                                {
                                i = remaining.erase(i);
                                }
                                else { ++i; }
                                }
                                *out = smallest->first;
                                ++out;
                                ++smallest->first;
                                if (smallest->first == smallest->second)
                                { smallest = remaining.erase(smallest); }
                                }
                                return out;
                                }
                                else
                                {
                                std::priority_queue<
                                std::pair<SubIt, SubIt>,
                                std::vector<std::pair<SubIt, SubIt> >,
                                multiway_merge_value_less<Less>
                                > q((multiway_merge_value_less<Less>(less)));
                                return multiway_merge(begin, end, out, less, q, distinct);
                                }
                                }

                                template<class It, class OutIt>
                                OutIt multiway_merge(
                                It const begin, It const end, OutIt const out,
                                bool const distinct = false)
                                {
                                return multiway_merge(
                                begin, end, out,
                                multiway_merge_default_less(), distinct);
                                }





                                share|improve this answer















                                I wrote this STL-style piece of code that does N-way merge and thought I'd post it here to help prevent others from reinventing the wheel. :)



                                Warning: it's only mildly tested. Test before use. :)



                                You can use it like this:



                                #include <vector>

                                int main()
                                {
                                std::vector<std::vector<int> > v;
                                std::vector<std::vector<int>::iterator> vout;
                                std::vector<int> v1;
                                std::vector<int> v2;
                                v1.push_back(1);
                                v1.push_back(2);
                                v1.push_back(3);
                                v2.push_back(0);
                                v2.push_back(1);
                                v2.push_back(2);
                                v.push_back(v1);
                                v.push_back(v2);
                                multiway_merge(v.begin(), v.end(), std::back_inserter(vout), false);
                                }


                                It also allows using pairs of iterators instead of the containers themselves.



                                If you use Boost.Range, you can remove some of the boilerplate code.



                                The code:



                                #include <algorithm>
                                #include <functional> // std::less
                                #include <iterator>
                                #include <queue> // std::priority_queue
                                #include <utility> // std::pair
                                #include <vector>

                                template<class OutIt>
                                struct multiway_merge_value_insert_iterator : public std::iterator<
                                std::output_iterator_tag, OutIt, ptrdiff_t
                                >
                                {
                                OutIt it;
                                multiway_merge_value_insert_iterator(OutIt const it = OutIt())
                                : it(it) { }

                                multiway_merge_value_insert_iterator &operator++(int)
                                { return *this; }

                                multiway_merge_value_insert_iterator &operator++()
                                { return *this; }

                                multiway_merge_value_insert_iterator &operator *()
                                { return *this; }

                                template<class It>
                                multiway_merge_value_insert_iterator &operator =(It const i)
                                {
                                *this->it = *i;
                                ++this->it;
                                return *this;
                                }
                                };

                                template<class OutIt>
                                multiway_merge_value_insert_iterator<OutIt>
                                multiway_merge_value_inserter(OutIt const it)
                                { return multiway_merge_value_insert_iterator<OutIt>(it); };

                                template<class Less>
                                struct multiway_merge_value_less : private Less
                                {
                                multiway_merge_value_less(Less const &less) : Less(less) { }
                                template<class It1, class It2>
                                bool operator()(
                                std::pair<It1, It1> const &b /* inverted */,
                                std::pair<It2, It2> const &a) const
                                {
                                return b.first != b.second && (
                                a.first == a.second ||
                                this->Less::operator()(*a.first, *b.first));
                                }
                                };

                                struct multiway_merge_default_less
                                {
                                template<class T>
                                bool operator()(T const &a, T const &b) const
                                { return std::less<T>()(a, b); }
                                };

                                template<class R>
                                struct multiway_merge_range_iterator
                                { typedef typename R::iterator type; };

                                template<class R>
                                struct multiway_merge_range_iterator<R const>
                                { typedef typename R::const_iterator type; };

                                template<class It>
                                struct multiway_merge_range_iterator<std::pair<It, It> >
                                { typedef It type; };

                                template<class R>
                                typename R::iterator multiway_merge_range_begin(R &r)
                                { return r.begin(); }

                                template<class R>
                                typename R::iterator multiway_merge_range_end(R &r)
                                { return r.end(); }

                                template<class R>
                                typename R::const_iterator multiway_merge_range_begin(R const &r)
                                { return r.begin(); }

                                template<class R>
                                typename R::const_iterator multiway_merge_range_end(R const &r)
                                { return r.end(); }

                                template<class It>
                                It multiway_merge_range_begin(std::pair<It, It> const &r)
                                { return r.first; }

                                template<class It>
                                It multiway_merge_range_end(std::pair<It, It> const &r)
                                { return r.second; }

                                template<class It, class OutIt, class Less, class PQ>
                                OutIt multiway_merge(
                                It begin, It const end, OutIt out, Less const &less,
                                PQ &pq, bool const distinct = false)
                                {
                                while (begin != end)
                                {
                                pq.push(typename PQ::value_type(
                                multiway_merge_range_begin(*begin),
                                multiway_merge_range_end(*begin)));
                                ++begin;
                                }
                                while (!pq.empty())
                                {
                                typename PQ::value_type top = pq.top();
                                pq.pop();
                                if (top.first != top.second)
                                {
                                while (!pq.empty() && pq.top().first == pq.top().second)
                                { pq.pop(); }
                                if (!distinct ||
                                pq.empty() ||
                                less(*pq.top().first, *top.first) ||
                                less(*top.first, *pq.top().first))
                                {
                                *out = top.first;
                                ++out;
                                }

                                ++top.first;
                                pq.push(top);
                                }
                                }
                                return out;
                                }

                                template<class It, class OutIt, class Less>
                                OutIt multiway_merge(
                                It const begin, It const end, OutIt out, Less const &less,
                                bool const distinct = false)
                                {
                                typedef typename multiway_merge_range_iterator<
                                typename std::iterator_traits<It>::value_type
                                >::type SubIt;
                                if (std::distance(begin, end) < 16)
                                {
                                typedef std::vector<std::pair<SubIt, SubIt> > Remaining;
                                Remaining remaining;
                                remaining.reserve(
                                static_cast<size_t>(std::distance(begin, end)));
                                for (It i = begin; i != end; ++i)
                                {
                                if (multiway_merge_range_begin(*i) !=
                                multiway_merge_range_end(*i))
                                {
                                remaining.push_back(std::make_pair(
                                multiway_merge_range_begin(*i),
                                multiway_merge_range_end(*i)));
                                }
                                }
                                while (!remaining.empty())
                                {
                                typename Remaining::iterator smallest =
                                remaining.begin();
                                for (typename Remaining::iterator
                                i = remaining.begin();
                                i != remaining.end();
                                )
                                {
                                if (less(*i->first, *smallest->first))
                                {
                                smallest = i;
                                ++i;
                                }
                                else if (distinct && i != smallest &&
                                !less(
                                *smallest->first,
                                *i->first))
                                {
                                i = remaining.erase(i);
                                }
                                else { ++i; }
                                }
                                *out = smallest->first;
                                ++out;
                                ++smallest->first;
                                if (smallest->first == smallest->second)
                                { smallest = remaining.erase(smallest); }
                                }
                                return out;
                                }
                                else
                                {
                                std::priority_queue<
                                std::pair<SubIt, SubIt>,
                                std::vector<std::pair<SubIt, SubIt> >,
                                multiway_merge_value_less<Less>
                                > q((multiway_merge_value_less<Less>(less)));
                                return multiway_merge(begin, end, out, less, q, distinct);
                                }
                                }

                                template<class It, class OutIt>
                                OutIt multiway_merge(
                                It const begin, It const end, OutIt const out,
                                bool const distinct = false)
                                {
                                return multiway_merge(
                                begin, end, out,
                                multiway_merge_default_less(), distinct);
                                }






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Jan 4 '13 at 21:44

























                                answered Jan 4 '13 at 21:37









                                MehrdadMehrdad

                                128k87408752




                                128k87408752























                                    1














                                    Here is my implementation using MinHeap...

                                    package merging;

                                    import java.io.BufferedReader;
                                    import java.io.BufferedWriter;
                                    import java.io.File;
                                    import java.io.FileReader;
                                    import java.io.FileWriter;
                                    import java.io.IOException;
                                    import java.io.PrintWriter;


                                    public class N_Way_Merge {

                                    int No_of_files=0;
                                    String listString;
                                    int listIndex;
                                    PrintWriter pw;
                                    private String fileDir = "D:\XMLParsing_Files\Extracted_Data";
                                    private File fileList;
                                    private BufferedReader readers;

                                    public static void main(String args) throws IOException {

                                    N_Way_Merge nwm=new N_Way_Merge();

                                    long start= System.currentTimeMillis();

                                    try {

                                    nwm.createFileList();

                                    nwm.createReaders();
                                    nwm.createMinHeap();
                                    }
                                    finally {
                                    nwm.pw.flush();
                                    nwm.pw.close();
                                    for (BufferedReader readers : nwm.readers) {

                                    readers.close();

                                    }
                                    }
                                    long end = System.currentTimeMillis();
                                    System.out.println("Files merged into a single file.nTime taken: "+((end-start)/1000)+"secs");
                                    }

                                    public void createFileList() throws IOException {
                                    //creates a list of sorted files present in a particular directory
                                    File folder = new File(fileDir);
                                    fileList = folder.listFiles();
                                    No_of_files=fileList.length;
                                    assign();
                                    System.out.println("No. of files - "+ No_of_files);

                                    }

                                    public void assign() throws IOException
                                    {
                                    listString = new String[No_of_files];
                                    listIndex = new int[No_of_files];
                                    pw = new PrintWriter(new BufferedWriter(new FileWriter("D:\XMLParsing_Files\Final.txt", true)));
                                    }

                                    public void createReaders() throws IOException {
                                    //creates array of BufferedReaders to read the files
                                    readers = new BufferedReader[No_of_files];

                                    for(int i=0;i<No_of_files;++i)
                                    {
                                    readers[i]=new BufferedReader(new FileReader(fileList[i]));
                                    }
                                    }

                                    public void createMinHeap() throws IOException {

                                    for(int i=0;i<No_of_files;i++)
                                    {
                                    listString[i]=readers[i].readLine();
                                    listIndex[i]=i;
                                    }

                                    WriteToFile(listString,listIndex);

                                    }

                                    public void WriteToFile(String listString,int listIndex) throws IOException{

                                    BuildHeap_forFirstTime(listString, listIndex);
                                    while(!(listString[0].equals("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz")))
                                    {
                                    pw.println(listString[0]);
                                    listString[0]=readers[listIndex[0]].readLine();

                                    MinHeapify(listString,listIndex,0);
                                    }

                                    }
                                    public void BuildHeap_forFirstTime(String listString,int listIndex){

                                    for(int i=(No_of_files/2)-1;i>=0;--i)
                                    MinHeapify(listString,listIndex,i);

                                    }

                                    public void MinHeapify(String listString,int listIndex,int index){

                                    int left=index*2 + 1;
                                    int right=left + 1;
                                    int smallest=index;
                                    int HeapSize=No_of_files;
                                    if(left <= HeapSize-1 && listString[left]!=null && (listString[left].compareTo(listString[index])) < 0)
                                    smallest = left;

                                    if(right <= HeapSize-1 && listString[right]!=null && (listString[right].compareTo(listString[smallest])) < 0)
                                    smallest=right;



                                    if(smallest!=index)
                                    {
                                    String temp=listString[index];
                                    listString[index]=listString[smallest];
                                    listString[smallest]=temp;

                                    listIndex[smallest]^=listIndex[index];
                                    listIndex[index]^=listIndex[smallest];
                                    listIndex[smallest]^=listIndex[index];

                                    MinHeapify(listString,listIndex,smallest);
                                    }

                                    }


                                    }






                                    share|improve this answer




























                                      1














                                      Here is my implementation using MinHeap...

                                      package merging;

                                      import java.io.BufferedReader;
                                      import java.io.BufferedWriter;
                                      import java.io.File;
                                      import java.io.FileReader;
                                      import java.io.FileWriter;
                                      import java.io.IOException;
                                      import java.io.PrintWriter;


                                      public class N_Way_Merge {

                                      int No_of_files=0;
                                      String listString;
                                      int listIndex;
                                      PrintWriter pw;
                                      private String fileDir = "D:\XMLParsing_Files\Extracted_Data";
                                      private File fileList;
                                      private BufferedReader readers;

                                      public static void main(String args) throws IOException {

                                      N_Way_Merge nwm=new N_Way_Merge();

                                      long start= System.currentTimeMillis();

                                      try {

                                      nwm.createFileList();

                                      nwm.createReaders();
                                      nwm.createMinHeap();
                                      }
                                      finally {
                                      nwm.pw.flush();
                                      nwm.pw.close();
                                      for (BufferedReader readers : nwm.readers) {

                                      readers.close();

                                      }
                                      }
                                      long end = System.currentTimeMillis();
                                      System.out.println("Files merged into a single file.nTime taken: "+((end-start)/1000)+"secs");
                                      }

                                      public void createFileList() throws IOException {
                                      //creates a list of sorted files present in a particular directory
                                      File folder = new File(fileDir);
                                      fileList = folder.listFiles();
                                      No_of_files=fileList.length;
                                      assign();
                                      System.out.println("No. of files - "+ No_of_files);

                                      }

                                      public void assign() throws IOException
                                      {
                                      listString = new String[No_of_files];
                                      listIndex = new int[No_of_files];
                                      pw = new PrintWriter(new BufferedWriter(new FileWriter("D:\XMLParsing_Files\Final.txt", true)));
                                      }

                                      public void createReaders() throws IOException {
                                      //creates array of BufferedReaders to read the files
                                      readers = new BufferedReader[No_of_files];

                                      for(int i=0;i<No_of_files;++i)
                                      {
                                      readers[i]=new BufferedReader(new FileReader(fileList[i]));
                                      }
                                      }

                                      public void createMinHeap() throws IOException {

                                      for(int i=0;i<No_of_files;i++)
                                      {
                                      listString[i]=readers[i].readLine();
                                      listIndex[i]=i;
                                      }

                                      WriteToFile(listString,listIndex);

                                      }

                                      public void WriteToFile(String listString,int listIndex) throws IOException{

                                      BuildHeap_forFirstTime(listString, listIndex);
                                      while(!(listString[0].equals("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz")))
                                      {
                                      pw.println(listString[0]);
                                      listString[0]=readers[listIndex[0]].readLine();

                                      MinHeapify(listString,listIndex,0);
                                      }

                                      }
                                      public void BuildHeap_forFirstTime(String listString,int listIndex){

                                      for(int i=(No_of_files/2)-1;i>=0;--i)
                                      MinHeapify(listString,listIndex,i);

                                      }

                                      public void MinHeapify(String listString,int listIndex,int index){

                                      int left=index*2 + 1;
                                      int right=left + 1;
                                      int smallest=index;
                                      int HeapSize=No_of_files;
                                      if(left <= HeapSize-1 && listString[left]!=null && (listString[left].compareTo(listString[index])) < 0)
                                      smallest = left;

                                      if(right <= HeapSize-1 && listString[right]!=null && (listString[right].compareTo(listString[smallest])) < 0)
                                      smallest=right;



                                      if(smallest!=index)
                                      {
                                      String temp=listString[index];
                                      listString[index]=listString[smallest];
                                      listString[smallest]=temp;

                                      listIndex[smallest]^=listIndex[index];
                                      listIndex[index]^=listIndex[smallest];
                                      listIndex[smallest]^=listIndex[index];

                                      MinHeapify(listString,listIndex,smallest);
                                      }

                                      }


                                      }






                                      share|improve this answer


























                                        1












                                        1








                                        1







                                        Here is my implementation using MinHeap...

                                        package merging;

                                        import java.io.BufferedReader;
                                        import java.io.BufferedWriter;
                                        import java.io.File;
                                        import java.io.FileReader;
                                        import java.io.FileWriter;
                                        import java.io.IOException;
                                        import java.io.PrintWriter;


                                        public class N_Way_Merge {

                                        int No_of_files=0;
                                        String listString;
                                        int listIndex;
                                        PrintWriter pw;
                                        private String fileDir = "D:\XMLParsing_Files\Extracted_Data";
                                        private File fileList;
                                        private BufferedReader readers;

                                        public static void main(String args) throws IOException {

                                        N_Way_Merge nwm=new N_Way_Merge();

                                        long start= System.currentTimeMillis();

                                        try {

                                        nwm.createFileList();

                                        nwm.createReaders();
                                        nwm.createMinHeap();
                                        }
                                        finally {
                                        nwm.pw.flush();
                                        nwm.pw.close();
                                        for (BufferedReader readers : nwm.readers) {

                                        readers.close();

                                        }
                                        }
                                        long end = System.currentTimeMillis();
                                        System.out.println("Files merged into a single file.nTime taken: "+((end-start)/1000)+"secs");
                                        }

                                        public void createFileList() throws IOException {
                                        //creates a list of sorted files present in a particular directory
                                        File folder = new File(fileDir);
                                        fileList = folder.listFiles();
                                        No_of_files=fileList.length;
                                        assign();
                                        System.out.println("No. of files - "+ No_of_files);

                                        }

                                        public void assign() throws IOException
                                        {
                                        listString = new String[No_of_files];
                                        listIndex = new int[No_of_files];
                                        pw = new PrintWriter(new BufferedWriter(new FileWriter("D:\XMLParsing_Files\Final.txt", true)));
                                        }

                                        public void createReaders() throws IOException {
                                        //creates array of BufferedReaders to read the files
                                        readers = new BufferedReader[No_of_files];

                                        for(int i=0;i<No_of_files;++i)
                                        {
                                        readers[i]=new BufferedReader(new FileReader(fileList[i]));
                                        }
                                        }

                                        public void createMinHeap() throws IOException {

                                        for(int i=0;i<No_of_files;i++)
                                        {
                                        listString[i]=readers[i].readLine();
                                        listIndex[i]=i;
                                        }

                                        WriteToFile(listString,listIndex);

                                        }

                                        public void WriteToFile(String listString,int listIndex) throws IOException{

                                        BuildHeap_forFirstTime(listString, listIndex);
                                        while(!(listString[0].equals("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz")))
                                        {
                                        pw.println(listString[0]);
                                        listString[0]=readers[listIndex[0]].readLine();

                                        MinHeapify(listString,listIndex,0);
                                        }

                                        }
                                        public void BuildHeap_forFirstTime(String listString,int listIndex){

                                        for(int i=(No_of_files/2)-1;i>=0;--i)
                                        MinHeapify(listString,listIndex,i);

                                        }

                                        public void MinHeapify(String listString,int listIndex,int index){

                                        int left=index*2 + 1;
                                        int right=left + 1;
                                        int smallest=index;
                                        int HeapSize=No_of_files;
                                        if(left <= HeapSize-1 && listString[left]!=null && (listString[left].compareTo(listString[index])) < 0)
                                        smallest = left;

                                        if(right <= HeapSize-1 && listString[right]!=null && (listString[right].compareTo(listString[smallest])) < 0)
                                        smallest=right;



                                        if(smallest!=index)
                                        {
                                        String temp=listString[index];
                                        listString[index]=listString[smallest];
                                        listString[smallest]=temp;

                                        listIndex[smallest]^=listIndex[index];
                                        listIndex[index]^=listIndex[smallest];
                                        listIndex[smallest]^=listIndex[index];

                                        MinHeapify(listString,listIndex,smallest);
                                        }

                                        }


                                        }






                                        share|improve this answer













                                        Here is my implementation using MinHeap...

                                        package merging;

                                        import java.io.BufferedReader;
                                        import java.io.BufferedWriter;
                                        import java.io.File;
                                        import java.io.FileReader;
                                        import java.io.FileWriter;
                                        import java.io.IOException;
                                        import java.io.PrintWriter;


                                        public class N_Way_Merge {

                                        int No_of_files=0;
                                        String listString;
                                        int listIndex;
                                        PrintWriter pw;
                                        private String fileDir = "D:\XMLParsing_Files\Extracted_Data";
                                        private File fileList;
                                        private BufferedReader readers;

                                        public static void main(String args) throws IOException {

                                        N_Way_Merge nwm=new N_Way_Merge();

                                        long start= System.currentTimeMillis();

                                        try {

                                        nwm.createFileList();

                                        nwm.createReaders();
                                        nwm.createMinHeap();
                                        }
                                        finally {
                                        nwm.pw.flush();
                                        nwm.pw.close();
                                        for (BufferedReader readers : nwm.readers) {

                                        readers.close();

                                        }
                                        }
                                        long end = System.currentTimeMillis();
                                        System.out.println("Files merged into a single file.nTime taken: "+((end-start)/1000)+"secs");
                                        }

                                        public void createFileList() throws IOException {
                                        //creates a list of sorted files present in a particular directory
                                        File folder = new File(fileDir);
                                        fileList = folder.listFiles();
                                        No_of_files=fileList.length;
                                        assign();
                                        System.out.println("No. of files - "+ No_of_files);

                                        }

                                        public void assign() throws IOException
                                        {
                                        listString = new String[No_of_files];
                                        listIndex = new int[No_of_files];
                                        pw = new PrintWriter(new BufferedWriter(new FileWriter("D:\XMLParsing_Files\Final.txt", true)));
                                        }

                                        public void createReaders() throws IOException {
                                        //creates array of BufferedReaders to read the files
                                        readers = new BufferedReader[No_of_files];

                                        for(int i=0;i<No_of_files;++i)
                                        {
                                        readers[i]=new BufferedReader(new FileReader(fileList[i]));
                                        }
                                        }

                                        public void createMinHeap() throws IOException {

                                        for(int i=0;i<No_of_files;i++)
                                        {
                                        listString[i]=readers[i].readLine();
                                        listIndex[i]=i;
                                        }

                                        WriteToFile(listString,listIndex);

                                        }

                                        public void WriteToFile(String listString,int listIndex) throws IOException{

                                        BuildHeap_forFirstTime(listString, listIndex);
                                        while(!(listString[0].equals("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz")))
                                        {
                                        pw.println(listString[0]);
                                        listString[0]=readers[listIndex[0]].readLine();

                                        MinHeapify(listString,listIndex,0);
                                        }

                                        }
                                        public void BuildHeap_forFirstTime(String listString,int listIndex){

                                        for(int i=(No_of_files/2)-1;i>=0;--i)
                                        MinHeapify(listString,listIndex,i);

                                        }

                                        public void MinHeapify(String listString,int listIndex,int index){

                                        int left=index*2 + 1;
                                        int right=left + 1;
                                        int smallest=index;
                                        int HeapSize=No_of_files;
                                        if(left <= HeapSize-1 && listString[left]!=null && (listString[left].compareTo(listString[index])) < 0)
                                        smallest = left;

                                        if(right <= HeapSize-1 && listString[right]!=null && (listString[right].compareTo(listString[smallest])) < 0)
                                        smallest=right;



                                        if(smallest!=index)
                                        {
                                        String temp=listString[index];
                                        listString[index]=listString[smallest];
                                        listString[smallest]=temp;

                                        listIndex[smallest]^=listIndex[index];
                                        listIndex[index]^=listIndex[smallest];
                                        listIndex[smallest]^=listIndex[index];

                                        MinHeapify(listString,listIndex,smallest);
                                        }

                                        }


                                        }







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Oct 3 '13 at 21:32









                                        Sumit Kumar SahaSumit Kumar Saha

                                        5111921




                                        5111921























                                            0














                                            Java implementation of min heap algorithm for merging k sorted arrays:



                                            public class MergeKSorted {

                                            /**
                                            * helper object to store min value of each array in a priority queue,
                                            * the kth array and the index into kth array
                                            *
                                            */
                                            static class PQNode implements Comparable<PQNode>{
                                            int value;
                                            int kth = 0;
                                            int indexKth = 0;

                                            public PQNode(int value, int kth, int indexKth) {
                                            this.value = value;
                                            this.kth = kth;
                                            this.indexKth = indexKth;
                                            }
                                            @Override
                                            public int compareTo(PQNode o) {
                                            if(o != null) {
                                            return Integer.valueOf(value).compareTo(Integer.valueOf(o.value));
                                            }
                                            else return 0;
                                            }

                                            @Override
                                            public String toString() {
                                            return value+" "+kth+" "+indexKth;
                                            }
                                            }
                                            public static void mergeKSorted(int sortedArrays) {
                                            int k = sortedArrays.length;
                                            int resultCtr = 0;
                                            int totalSize = 0;
                                            PriorityQueue<PQNode> pq = new PriorityQueue<>();
                                            for(int i=0; i<k; i++) {
                                            int kthArray = sortedArrays[i];
                                            totalSize+=kthArray.length;
                                            if(kthArray.length > 0) {
                                            PQNode temp = new PQNode(kthArray[0], i, 0);
                                            pq.add(temp);
                                            }
                                            }
                                            int result = new int[totalSize];
                                            while(!pq.isEmpty()) {
                                            PQNode temp = pq.poll();
                                            int kthArray = sortedArrays[temp.kth];
                                            result[resultCtr] = temp.value;
                                            resultCtr++;
                                            temp.indexKth++;
                                            if(temp.indexKth < kthArray.length) {
                                            temp = new PQNode(kthArray[temp.indexKth], temp.kth, temp.indexKth);
                                            pq.add(temp);
                                            }

                                            }
                                            print(result);
                                            }

                                            public static void print(int a) {
                                            StringBuilder sb = new StringBuilder();
                                            for(int v : a) {
                                            sb.append(v).append(" ");
                                            }
                                            System.out.println(sb);
                                            }

                                            public static void main(String args) {
                                            int sortedA = {
                                            {3,4,6,9},
                                            {4,6,8,9,12},
                                            {3,4,9},
                                            {1,4,9}
                                            };
                                            mergeKSorted(sortedA);
                                            }

                                            }





                                            share|improve this answer




























                                              0














                                              Java implementation of min heap algorithm for merging k sorted arrays:



                                              public class MergeKSorted {

                                              /**
                                              * helper object to store min value of each array in a priority queue,
                                              * the kth array and the index into kth array
                                              *
                                              */
                                              static class PQNode implements Comparable<PQNode>{
                                              int value;
                                              int kth = 0;
                                              int indexKth = 0;

                                              public PQNode(int value, int kth, int indexKth) {
                                              this.value = value;
                                              this.kth = kth;
                                              this.indexKth = indexKth;
                                              }
                                              @Override
                                              public int compareTo(PQNode o) {
                                              if(o != null) {
                                              return Integer.valueOf(value).compareTo(Integer.valueOf(o.value));
                                              }
                                              else return 0;
                                              }

                                              @Override
                                              public String toString() {
                                              return value+" "+kth+" "+indexKth;
                                              }
                                              }
                                              public static void mergeKSorted(int sortedArrays) {
                                              int k = sortedArrays.length;
                                              int resultCtr = 0;
                                              int totalSize = 0;
                                              PriorityQueue<PQNode> pq = new PriorityQueue<>();
                                              for(int i=0; i<k; i++) {
                                              int kthArray = sortedArrays[i];
                                              totalSize+=kthArray.length;
                                              if(kthArray.length > 0) {
                                              PQNode temp = new PQNode(kthArray[0], i, 0);
                                              pq.add(temp);
                                              }
                                              }
                                              int result = new int[totalSize];
                                              while(!pq.isEmpty()) {
                                              PQNode temp = pq.poll();
                                              int kthArray = sortedArrays[temp.kth];
                                              result[resultCtr] = temp.value;
                                              resultCtr++;
                                              temp.indexKth++;
                                              if(temp.indexKth < kthArray.length) {
                                              temp = new PQNode(kthArray[temp.indexKth], temp.kth, temp.indexKth);
                                              pq.add(temp);
                                              }

                                              }
                                              print(result);
                                              }

                                              public static void print(int a) {
                                              StringBuilder sb = new StringBuilder();
                                              for(int v : a) {
                                              sb.append(v).append(" ");
                                              }
                                              System.out.println(sb);
                                              }

                                              public static void main(String args) {
                                              int sortedA = {
                                              {3,4,6,9},
                                              {4,6,8,9,12},
                                              {3,4,9},
                                              {1,4,9}
                                              };
                                              mergeKSorted(sortedA);
                                              }

                                              }





                                              share|improve this answer


























                                                0












                                                0








                                                0







                                                Java implementation of min heap algorithm for merging k sorted arrays:



                                                public class MergeKSorted {

                                                /**
                                                * helper object to store min value of each array in a priority queue,
                                                * the kth array and the index into kth array
                                                *
                                                */
                                                static class PQNode implements Comparable<PQNode>{
                                                int value;
                                                int kth = 0;
                                                int indexKth = 0;

                                                public PQNode(int value, int kth, int indexKth) {
                                                this.value = value;
                                                this.kth = kth;
                                                this.indexKth = indexKth;
                                                }
                                                @Override
                                                public int compareTo(PQNode o) {
                                                if(o != null) {
                                                return Integer.valueOf(value).compareTo(Integer.valueOf(o.value));
                                                }
                                                else return 0;
                                                }

                                                @Override
                                                public String toString() {
                                                return value+" "+kth+" "+indexKth;
                                                }
                                                }
                                                public static void mergeKSorted(int sortedArrays) {
                                                int k = sortedArrays.length;
                                                int resultCtr = 0;
                                                int totalSize = 0;
                                                PriorityQueue<PQNode> pq = new PriorityQueue<>();
                                                for(int i=0; i<k; i++) {
                                                int kthArray = sortedArrays[i];
                                                totalSize+=kthArray.length;
                                                if(kthArray.length > 0) {
                                                PQNode temp = new PQNode(kthArray[0], i, 0);
                                                pq.add(temp);
                                                }
                                                }
                                                int result = new int[totalSize];
                                                while(!pq.isEmpty()) {
                                                PQNode temp = pq.poll();
                                                int kthArray = sortedArrays[temp.kth];
                                                result[resultCtr] = temp.value;
                                                resultCtr++;
                                                temp.indexKth++;
                                                if(temp.indexKth < kthArray.length) {
                                                temp = new PQNode(kthArray[temp.indexKth], temp.kth, temp.indexKth);
                                                pq.add(temp);
                                                }

                                                }
                                                print(result);
                                                }

                                                public static void print(int a) {
                                                StringBuilder sb = new StringBuilder();
                                                for(int v : a) {
                                                sb.append(v).append(" ");
                                                }
                                                System.out.println(sb);
                                                }

                                                public static void main(String args) {
                                                int sortedA = {
                                                {3,4,6,9},
                                                {4,6,8,9,12},
                                                {3,4,9},
                                                {1,4,9}
                                                };
                                                mergeKSorted(sortedA);
                                                }

                                                }





                                                share|improve this answer













                                                Java implementation of min heap algorithm for merging k sorted arrays:



                                                public class MergeKSorted {

                                                /**
                                                * helper object to store min value of each array in a priority queue,
                                                * the kth array and the index into kth array
                                                *
                                                */
                                                static class PQNode implements Comparable<PQNode>{
                                                int value;
                                                int kth = 0;
                                                int indexKth = 0;

                                                public PQNode(int value, int kth, int indexKth) {
                                                this.value = value;
                                                this.kth = kth;
                                                this.indexKth = indexKth;
                                                }
                                                @Override
                                                public int compareTo(PQNode o) {
                                                if(o != null) {
                                                return Integer.valueOf(value).compareTo(Integer.valueOf(o.value));
                                                }
                                                else return 0;
                                                }

                                                @Override
                                                public String toString() {
                                                return value+" "+kth+" "+indexKth;
                                                }
                                                }
                                                public static void mergeKSorted(int sortedArrays) {
                                                int k = sortedArrays.length;
                                                int resultCtr = 0;
                                                int totalSize = 0;
                                                PriorityQueue<PQNode> pq = new PriorityQueue<>();
                                                for(int i=0; i<k; i++) {
                                                int kthArray = sortedArrays[i];
                                                totalSize+=kthArray.length;
                                                if(kthArray.length > 0) {
                                                PQNode temp = new PQNode(kthArray[0], i, 0);
                                                pq.add(temp);
                                                }
                                                }
                                                int result = new int[totalSize];
                                                while(!pq.isEmpty()) {
                                                PQNode temp = pq.poll();
                                                int kthArray = sortedArrays[temp.kth];
                                                result[resultCtr] = temp.value;
                                                resultCtr++;
                                                temp.indexKth++;
                                                if(temp.indexKth < kthArray.length) {
                                                temp = new PQNode(kthArray[temp.indexKth], temp.kth, temp.indexKth);
                                                pq.add(temp);
                                                }

                                                }
                                                print(result);
                                                }

                                                public static void print(int a) {
                                                StringBuilder sb = new StringBuilder();
                                                for(int v : a) {
                                                sb.append(v).append(" ");
                                                }
                                                System.out.println(sb);
                                                }

                                                public static void main(String args) {
                                                int sortedA = {
                                                {3,4,6,9},
                                                {4,6,8,9,12},
                                                {3,4,9},
                                                {1,4,9}
                                                };
                                                mergeKSorted(sortedA);
                                                }

                                                }






                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Jul 18 '16 at 5:04









                                                susmit shuklasusmit shukla

                                                123113




                                                123113






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5055909%2falgorithm-for-n-way-merge%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Full-time equivalent

                                                    Bicuculline

                                                    さくらももこ