9
\$\begingroup\$

I was trying out leetcode's first missing positive.
As per the challenge's description:

Given an unsorted integer array nums, return the smallest missing positive integer.

You must implement an algorithm that runs in O(n) time and uses O(1) auxiliary space.

My solution passes all tests, except last five i.e. where the nums array's size is \$10^5\$. When the array is of that size I get "Time Limit Exceeded".

Here is my solution: (PS: class and methods naming are kept as given by leetcode)

class Solution {
  public int firstMissingPositive(int[] nums) {
    int minIndex = 0;
    int integerCounter = 0;

    for (int i = minIndex; i < nums.length; i++) {
        if (i == 0) {
            integerCounter += 1;
        }

        if (nums[i] == integerCounter) {
            // swap
            int temp = nums[minIndex];
            nums[minIndex] = nums[i];
            nums[i] = temp;

            minIndex += 1;
            i = minIndex - 1;

            integerCounter++;
        }
    }

    return integerCounter > 0 ? integerCounter : -1;
  }
}

My Question
How can I improve this algorithm to tackle super large (e.g. \$10^5\$) nums array without getting the "Time Limit Exceeded" and without having to completely modify the function?

Graphical explanation of my solution using an example
First Missing Positive Flow with example

\$\endgroup\$
15
  • 2
    \$\begingroup\$ I don't think your edit invalidates anything in the existing answer, but generally, be careful with edits once an answer is posted. \$\endgroup\$
    – greybeard
    Commented Feb 28 at 21:01
  • 1
    \$\begingroup\$ IMO your edit invalidates the answer: "The only clue the code presented offers as to what it is all about is the name of the sole public instance method." and "document everything important, and everything "externally visible"". I have rolled back your edit, please see the link in greybeard's comment for more information. \$\endgroup\$
    – Peilonrayz
    Commented Feb 28 at 21:08
  • 2
    \$\begingroup\$ Your algorithm is not O(n) in time, so arguably it should fail this way - you need a more clever algorithm to beat the challenge! If you had every positive number (up to the length of the list), then each number would be pointing at an index in the list (adjust for 0-based indexing). These numbers form "chains" of one number pointing at the index of the next number. If there are numbers "missing", then there must also be some numbers that are "too big", and the answer lies along the chain that led up to that one. \$\endgroup\$ Commented Feb 29 at 15:59
  • 2
    \$\begingroup\$ @qwr That will probably pass too. The time and memory limits are not set up well for this problem. \$\endgroup\$ Commented Mar 1 at 1:07
  • 3
    \$\begingroup\$ FYI, 10^5 integers isn't "super large" by modern standards. For 4-byte int, that would only be 390 KiB, and will fit in L2 cache on recent CPUs. "super large" or "huge" to me implies much larger than L3 cache sizes, like many megabytes, maybe a few gigabytes of data. (Or depending on context, like HPC number crunching, many gigabytes; high-end HPC systems can have terabytes of RAM.) This size is medium, or the lower end of large. \$\endgroup\$ Commented Mar 1 at 17:43

4 Answers 4

6
\$\begingroup\$

One way to solve this within the given constraints is to use the input array to store which numbers have been seen. Negative numbers will never contribute to the answer, so we can iterate once to replace them with 0 (which will help with the next step).

Let us use nums[i] to mark whether the integer i + 1 appears in nums. To prevent overwriting elements we might need later, if we need to indicate the existence of a value i + 1, we can make the element at index i negative. Then, we can recover the original value of an element by taking the absolute value (with Math.abs). The longest possible consecutive sequence of positive numbers would be the range [1, nums.length] (which can only occur when all elements are distinct), so we can ignore any elements outside that range.

We make another pass over the array to mark the seen elements that could contribute to a consecutive sequence of positive numbers. When marking an element, if the value at the position is positive, we multiply it by -1 to make it negative. If it is 0, we set the value to a negative value such that its absolute value is outside the range of [1, nums.length] to ensure it is not considered part of the answer, e.g. -nums.length - 1. (Alternatively, we can set the value to the value seen on the current iteration multiplied by -1, as having the same number appear multiple times does not affect the answer.)

Finally, we loop over the array once more, finding the longest prefix of negative numbers (which indicates the longest consecutive sequence of positive numbers we can make from numbers in the array starting from 1).

Sample implementation:

public int firstMissingPositive(int[] nums) {
    for (int i = 0; i < nums.length; i++) nums[i] = Math.max(nums[i], 0);
    for (int i = 0; i < nums.length; i++) {
        int actualVal = Math.abs(nums[i]);
        if (actualVal > 0 && actualVal <= nums.length) {
            if (nums[actualVal - 1] == 0) nums[actualVal - 1] = ~nums.length;
            else if (nums[actualVal - 1] > 0) nums[actualVal - 1] *= -1;
        }
    }
    int i = 0;
    while (i < nums.length && nums[i] < 0) ++i;
    return i + 1;
}
\$\endgroup\$
5
  • \$\begingroup\$ Yep, seems best O(n) (and O(1) memory overhead) solution so far. Was trying to rewrite my code to have it O(N), got worse than the prior O(N^2) solutions as I started to struggle with edge cases. Thank you for sharing your solution, will be a good learning opportunity for me. \$\endgroup\$
    – ccot
    Commented Mar 1 at 11:51
  • \$\begingroup\$ @ccot Happy to help. \$\endgroup\$ Commented Mar 1 at 12:17
  • 1
    \$\begingroup\$ This is the intended solution I think (assuming you are allowed to overwrite the input array and this counts as O(1) memory; you could have an online algorithm that truly is O(1) memory) \$\endgroup\$
    – qwr
    Commented Mar 1 at 13:58
  • 1
    \$\begingroup\$ @qwr This is a fairly template sort of problem, where the solution usually involves modifying the input to store values in some way. \$\endgroup\$ Commented Mar 1 at 14:04
  • 1
    \$\begingroup\$ @qwr: The key phrase in the original problem statement is "O(1) auxiliary space", implying constant space separate from the input, so this is allowed if you choose not to implement it as an "online" stream reader. Destroying the input without being explicitly told you can do that seems a bit fishy to me, but apparently that's normal for this sort of problem. \$\endgroup\$ Commented Mar 1 at 17:50
9
\$\begingroup\$

The only clue the code presented offers as to what it is all about is the name of the sole public instance method.

Do not follow bad examples, e.g. set by challenge sites. Commendable habits:

  • name essential things descriptively
  • document everything important, and everything "externally visible"
/** Determine the smallest positive integer not in an array
 *   where the array elements are not necessarily ordered.
 */// https://leetcode.com/problems/first-missing-positive/description/
class SmallestPositiveIntNotInArray {
    /** @return the smallest positive integer not in <code>nums</code>. */
    // required to run in O(nums.length) time and O(1) additional space.
    public int firstMissingPositive(int[] nums) {
        throw new UnsupportedOperationException(
                  "firstMissingPositive() not implemented (yet)");   
    }
}
class Solution extends SmallestPositiveIntNotInArray {}

The for-loop looks an ordinary "counting" loop, but isn't:
give an advance alert!

    // i manipulated non-monotonically in loop body
    for (int i = minIndex; i < nums.length; i++) {
        …
            i = minIndex;
            minIndex += 1;
        …
\$\endgroup\$
6
\$\begingroup\$

You search for a "1" first, starting at index 0, exchange it with the element at index 0, if found, and then search for "2", starting at index 1, a.s.o.

That is something similar to Exchange sort, but optimized for the specific needs of the problem, but still has the O(\$n^2\$) worst case run time.

To see this, look what happens when the array has the numbers \$1,2,\ldots,n\$ in reverse order! You take n steps to find the "1", then n-2 steps to find the "2", a.s.o. Once you've found the numbers until \$\frac{n}2\$, the remaining part goes fast as the array is then ordered.

The number of steps through the array is \$n+(n-2)+(n-4)+\ldots+n/2\$, which is roughly \$\frac{3}{16}n^2\$ (CORRECTED, was \$\frac{}8\$ before), which is not what the problem wants.

Considering that general sorting algorithms take O(\$n\log n\$) time, even changing to a better (general) sorting algorithm would not help conceptually (it might help get past the timing tests).

\$\endgroup\$
4
  • \$\begingroup\$ thanks for the detailed feedback, learned a lot from you , wasn't aware of exchange sort. What if, instead of a general sorting algorithm, in my code i check let's say if array >1000 in size, then I launch some stream ie get the 10^5 array in chunks, and i run my algorithm on each chunk: if first missing positive is missing in a chunk -> return it and halt program; else move to next chunk and so on? \$\endgroup\$
    – ccot
    Commented Feb 29 at 12:16
  • 5
    \$\begingroup\$ @ccot While you could speed up the algorithm with parallelism, this probably isn't what this challenge wants to teach you. The goal is to find an efficient algorithm, not to make an inefficient algorithm faster. Besides, the complexity remains O(N^2) even after parallelization, the constant speedup factor won't change this. \$\endgroup\$ Commented Feb 29 at 12:38
  • \$\begingroup\$ If you could use O(n) space, then radix sort works (I think). Maybe you can modify from there. \$\endgroup\$
    – qwr
    Commented Mar 1 at 0:44
  • \$\begingroup\$ @Green绿色 yeah indeed, you are right. \$\endgroup\$
    – ccot
    Commented Mar 1 at 11:51
4
\$\begingroup\$

You solve this by using the original array to store whether values are present. There are 3 "paths" for each index.

  1. You will leave the item alone at nums[i] then i++and continue
    • This is the case if nums[i] == 0 or nums[i] = i + 1
  2. You will set nums[i] = 0 then i++and continue
    • This is the case when, nums[i] is negative, or nums[i] > nums.length
  3. You will swap nums[i] and nums[nums[i] - 1] (DO NOT INCREASE INDEX WITH i++)
    • this is the default case

Then, you loop over you array and check if nums[i]==0 return i+1. If you make it through the whole loop return nums.length+ This uses O(1) memory and O(n) runtime.

public class Solution {
    public int firstMissingPositive(int[] nums) {
        int length = nums.length;
        
        for (int i = 0; i < length;) {
            if (nums[i] == 0 || nums[i] == i + 1) {
                i++;
                continue;
            }

            if (nums[i] > length || nums[i] < 0 || nums[nums[i] - 1] == nums[i]) {
                nums[i] = 0;
                i++;
                continue;
            }

            int temp = nums[nums[i] - 1];
            nums[nums[i] - 1] = nums[i];
            nums[i] = temp;
        }
        
        for (int i = 0; i < length; i++) {
            if (i + 1 != nums[i]) {
                return i + 1;
            }
        }
        return length + 1;
    }
}
\$\endgroup\$
2
  • 1
    \$\begingroup\$ To convert the given Go code to Java - What Go code? Is this copy/pasted from somewhere? The code in your bullet list looks like valid Java syntax, and you just described swapping in words, not using tuple assignment. \$\endgroup\$ Commented Mar 1 at 20:15
  • \$\begingroup\$ Yeah i converted the code from go and forgot to remove that, it's removed now. \$\endgroup\$
    – carsongh
    Commented Mar 2 at 14:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.