2
\$\begingroup\$

Problem statement: Sort integer array nums in O(N log N) time, without relying on any library sort routine.

I am trying to use a tree sort method (using the creation of a BST and then performing inorder traversal) in order to sort an array of numbers. I have the following code:

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:

        output = []

        def insert(node, val):
            if not node:
                return TreeNode(val)
            
            if val <= node.val:
                node.left = insert(node.left, val)
            else:
                node.right = insert(node.right, val)
            
            return node

        root = TreeNode(nums[0])
        for i in range(1, len(nums)):
            root = insert(root, nums[i])
        
        def inorder(root):
            if not root:
                return
            
            inorder(root.left)
            output.append(root.val)
            inorder(root.right)
        
        inorder(root)

        return output

I am getting a TLE (time limit exceeded) error. I am not sure if there is an optimization to do in order to tighten up time from recursive stack calls in both the insert() function. I am not sure if the recursive solution is not optimal (given the amount of recursive calls used). Perhaps I need to approach with an iterative approach in order to build the BST during the insert() calls?

\$\endgroup\$
1
  • \$\begingroup\$ Merge sort has worst case O(nlogn) time complexity, and in my opinion is easier to code than self balancing trees. \$\endgroup\$ Commented Feb 5 at 13:20

1 Answer 1

4
\$\begingroup\$

edge case

I feel the first result displayed here is reasonable:

>>> sorted([])
[]
>>> sorted([6, 4])
[4, 6]

Given zero elements, and asked to spit them back in ascending order, the proper response is zero elements.

In contrast, your TreeNode(nums[0]) call catastrophically raises IndexError in the empty case.

base case

Your insert function is lovely.

        root = TreeNode(nums[0])

Caller might possibly have preferred to DRY this up with

        root = insert(None, nums[0])

so that only insert needs to worry about that detail.

globals

        def inorder(root):

See, now this is where "nested defs" gets a bad name. Your insert was beautiful. It could have been a separate (testable!) @staticmethod, but it's simple and it fit in nicely.

In contrast, this function should probably have a signature of

        def inorder(output, root):

(Plus, output was perhaps initialized a bit early, so we have nearly forgotten about it by the time it is used.)

Ok, enough about coupling.

time complexity

When you approach a cryptography problem, or an OWASP cyber security problem, or even a simple sort, you have to have a mind-set that the whole world is out to get you. The computer, the internet, and especially the adversary. Which in this case would be the LeetCode folks.

Now, imagine a happy world, with unicorns prancing beneath rainbows. You get N integers to sort. And guess what? It's a random permutation of natural numbers up to N. Life is good, your Binary Tree algorithm can blaze through them, inserting each number in O(log N) time.

But then the next day storm clouds have come out, the shy unicorns have run away, and LeetCode sends you one of these as input:

  • list(range(n))
  • list(reversed(range(n)))

Will insert run in logarithmic time? Heavy sigh! It needs O(N) linear time to chase through those unbalanced nodes. Total sort time is O(N²) quadratic. We are ruined!

Fortunately this is an off-line (batch) sort -- you are given the numbers all at once. It's not like we're trying to maintain an online pqueue or anything. If you want to save the current algorithm, you could enforce the randomness requirement with Fisher-Yates pre-conditioning at linear cost. Not that I've ever seen anyone do that in real production code. Plus, it would behave suboptimally if given N 0's plus a single 1 to sort.

Consider looking into one of the various Balanced binary tree datastructures, such as a heap or the ever popular red-black tree.

\$\endgroup\$
2
  • \$\begingroup\$ Thanks for the explanation. I did run into issues with reversed/already-sorted ranges so I implemented Fisher-Yates to randomize (of course this wouldn't be good programming practice in real-life). However, I noticed for a long list of duplicate values (i.e., [2] * n) Fisher-Yates and a classic BST construction scheme fails. I will try a red-black tree approach and then implement a heap approach in another run. \$\endgroup\$
    – qxzsilver
    Commented Feb 19 at 18:42
  • \$\begingroup\$ Sooo, when there's many duplicates, this approach will tend to have trouble. Of course, so many duplicates implies we can just radix sort, and determining that requires just O(n) linear time to construct a histogram. We are willing to allocate a "small" histogram, much smaller than N, and we bail with fatal error if it turns out to be "large". A limiting case would be [3] + [2]*n. The histogram tells us that only 2 and 3 are present. We output n copies of 2 followed by 1 copy of 3 and we're done. // Or keep a count in each tree node. \$\endgroup\$
    – J_H
    Commented Feb 19 at 19:07

Not the answer you're looking for? Browse other questions tagged or ask your own question.