5
\$\begingroup\$

Given a string s, reverse only all the vowels in the string and return it.

The vowels are 'a', 'e', 'i', 'o', and 'u', and they can appear in both lower and upper cases, more than once.

My code works to reverse the vowels of a string; however, when given a large string, it fails due to "Time Limit Exceeded". How can I make this more efficient so it can handle large inputs?

def reverseVowels(self, s: str) -> str:
        v = []
        vowels = ['a', 'e', 'i', 'o', 'u']

        for c in s:
            if c.lower() in vowels:
                v.append(c)

        v.reverse()
        s = list(s)
        j = 0

        for i in range(len(s)):
            if s[i] in v:
                s[i] = v[j]
                j += 1
        s = "".join(s)
                
        return s

Example 2:

Input: s = "leetcode"
Output: "leotcede"

\$\endgroup\$
1

8 Answers 8

10
\$\begingroup\$

This is your mistake: if s[i] in v:

Simply change that to if s[i].lower() in vowels:. Then it easily gets accepted where I assume you do it.

The problem is that v can be large, making that check slow. For example for input s = 'a'*150000 + 'e'*150000. Your v will be the reverse, so every 'a' will take long to be found in it, as the search goes through all those 'e'. You take quadratic time and it should be linear.

Alternatively, prepare V = set(v) between the loops and use if s[i] in V:.

Bonus: a regex+slicing solution (fastest in my benchmark, fastest alternative I've found is ~27% slower):

def reverseVowels(self, s: str) -> str:
    a = re.split('([aeiouAEIOU])', s)
    a[1::2] = a[-2::-2]
    return ''.join(a)
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Yes, the quadratic time property was the single most important thing to fix IMO; it's what makes the code not-scalable. All the rest is minor in comparison. \$\endgroup\$
    – Don Hatch
    Commented Jul 4 at 12:39
9
\$\begingroup\$

A set is slightly faster when using in, and a comprehension is slightly faster than a loop with append()

enumerate() allows recording of both the vowel and the vowel's index, ensuring only the vowels are iterated over the second time.

Then you can process just the first half of the list along side the second half of the list (reversed), swapping the characters. This has the side benefit of skipping the middle vowel if there is an odd number.

vowels = [
    (c, ix)
    for ix, c in enumerate(s)
        if c.lower() in {'a', 'e', 'i', 'o', 'u'}
]

s = list(s)

half = len(vowels) // 2

vowel_pairs = zip(
    vowels[:half]
    vowels[-half:][::-1]
)

for a, b in vowel_pairs:
    s[a[1]] = b[0]
    s[b[1]] = a[0]

s = "".join(s)

Demo : https://trinket.io/python3/681a1fb83d

New contributor
MatBailie is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$
7
  • 1
    \$\begingroup\$ An alternative that splits the string in to alternating parts (consonants,vowel, consonants,vowel,...,vowel,consonants) then uses slicing to reverse just the vowels. This makes the setup slower, but moves the second loop in to the runtime. trinket.io/python3/cfcbcf16e1 \$\endgroup\$
    – MatBailie
    Commented Jul 3 at 17:47
  • 2
    \$\begingroup\$ @MatBailie That (alternating parts) is how I did it, but with regex. Much simpler. \$\endgroup\$
    – no comment
    Commented Jul 3 at 17:49
  • 1
    \$\begingroup\$ @nocomment I'm very rusty at regex, what would the expression be? \$\endgroup\$
    – MatBailie
    Commented Jul 3 at 17:55
  • 3
    \$\begingroup\$ parts = re.split('([aeiouAEIOU])', s) \$\endgroup\$
    – no comment
    Commented Jul 3 at 17:57
  • 1
    \$\begingroup\$ @nocomment Wow, I would have expected 'abba' to come out as ['a','bb','a']. I really should have tried it rather than assumed. \$\endgroup\$
    – MatBailie
    Commented Jul 3 at 18:02
6
\$\begingroup\$

You could convert vowels into a set for a slight improvement, since sets are quicker to check for a certain value than lists.

For your second loop, you don't need to check every character of the input again. Since you already found the vowels, you can also store the location where you found them in your first loop:

vowels = set(['a', 'e', 'i', 'o', 'u'])
s_chars = list(s)

vowels_in_s = []
vowel_indexes_in_s = []
for idx, char in enumerate(s_chars):
    if char.lower() in vowels:
        vowels_in_s.append(char)
        vowel_indexes_in_s.append(idx)

Then to add the vowels in their reversed places, you just reverse the indexes:

vowel_indexes_in_s.reverse()

and replace the vowels using their new indexes:

for idx, char in zip(vowel_indexes_in_s, vowels_in_s):
    s_chars[idx] = char

zip is a useful function that lets you build combined data structures from separate lists. Here we're taking a list of vowels, and a list of indexes, and building [[vowel, index], [vowel, index], ...].

Finally you join the characters like you did before. Add it all together and:

def reverse_vowels(s):
    vowels = set(['a', 'e', 'i', 'o', 'u'])
    s_chars = list(s)

    vowels_in_s = []
    vowel_indexes_in_s = []
    for idx, char in enumerate(s_chars):
        if char.lower() in vowels:
            vowels_in_s.append(char)
            vowel_indexes_in_s.append(idx)

    vowel_indexes_in_s.reverse()

    for idx, char in zip(vowel_indexes_in_s, vowels_in_s):
        s_chars[idx] = char

    return ''.join(s_chars)
\$\endgroup\$
3
\$\begingroup\$

Benchmarks

Highly likely this is from LeetCode, so I benchmarked our solutions there. Times from three submissions:

 80.5  77.4  77.6 ms  Jan_B
 12.9  12.5  12.8 ms  MatBailie
  7.8   8.0   8.3 ms  Schmuddi
 11.6  11.7  11.8 ms  Simon_Brahan
 22.1  22.1  22.1 ms  no_comment_minimal_fix_1
 15.7  15.7  15.8 ms  no_comment_minimal_fix_2
  2.7   2.7   2.8 ms  no_comment_numpy
  5.2   4.9   5.3 ms  no_comment_regex

My NumPy solution inspired by Eman Yalpsid's answer takes advantage of the input strings guaranteed to be ASCII.

As LeetCode's time measurements are bad, I did my own, running all solutions on all 480 test cases and printing total times at the end. Full code as submitted

import numpy as np

class Solution:

    def reverseVowels(self, s: str, timess=[]) -> str:
        from time import perf_counter as time
        funcs = [
            self.no_comment_minimal_fix_1,
            self.no_comment_minimal_fix_2,
            self.no_comment_regex,
            self.no_comment_numpy,
            self.Jan_B, 
            self.MatBailie,
            self.Simon_Brahan,
            self.Schmuddi,
        ]
        names = sorted(f.__name__ for f in funcs)
        
        # Measure each solution on this test case
        times = {f: [] for f in names}
        n = 1
        for _ in range(10):
            shuffle(funcs)
            for f in funcs:
                t = time()
                for _ in range(n):
                    f(s)
                times[f.__name__].append(time() - t)
        timess.append(times)
        
        # Check correctness
        result = funcs[0](s)
        for f in funcs:
            if f(s) != result:
                return None

        # After all test cases, show stats
        if len(timess) == 480:
            for f in names:
                print(f'{sum(min(times[f]) for times in timess) / n * 1e3:5.1f} ms ', f)
            return 'intentionally wrong answer so we get to see the output'

        return result

    
    def no_comment_minimal_fix_1(self, s: str) -> str:
        v = []
        vowels = ['a', 'e', 'i', 'o', 'u']

        for c in s:
            if c.lower() in vowels:
                v.append(c)

        v.reverse()
        s = list(s)
        j = 0

        for i in range(len(s)):
            if s[i].lower() in vowels:
                s[i] = v[j]
                j += 1
        s = "".join(s)
                
        return s

    
    def no_comment_minimal_fix_2(self, s: str) -> str:
        v = []
        vowels = ['a', 'e', 'i', 'o', 'u']

        for c in s:
            if c.lower() in vowels:
                v.append(c)

        v.reverse()
        V = set(v)
        s = list(s)
        j = 0

        for i in range(len(s)):
            if s[i] in V:
                s[i] = v[j]
                j += 1
        s = "".join(s)
                
        return s

    
    def MatBailie(self, s: str) -> str:
        vowels = [
            (c, ix)
            for ix, c in enumerate(s)
                if c.lower() in {'a', 'e', 'i', 'o', 'u'}
        ]

        s = list(s)

        half = len(vowels) // 2

        vowel_pairs = zip(
            vowels[:half],
            vowels[-half:][::-1]
        )

        for a, b in vowel_pairs:
            s[a[1]] = b[0]
            s[b[1]] = a[0]

        return "".join(s)


    def Simon_Brahan(self, s: str) -> str:
        vowels = set(['a', 'e', 'i', 'o', 'u'])
        s_chars = list(s)

        vowels_in_s = []
        vowel_indexes_in_s = []
        for idx, char in enumerate(s_chars):
            if char.lower() in vowels:
                vowels_in_s.append(char)
                vowel_indexes_in_s.append(idx)

        vowel_indexes_in_s.reverse()

        for idx, char in zip(vowel_indexes_in_s, vowels_in_s):
            s_chars[idx] = char

        return ''.join(s_chars)

    
    def no_comment_regex(self, s: str) -> str:
        a = re.split('([aeiouAEIOU])', s)
        a[1::2] = a[-2::-2]
        return ''.join(a)
    
    
    def no_comment_numpy(self, s: str) -> str:
        b = bytearray(s, 'ascii')
        a = np.frombuffer(b, dtype=np.uint8)
        mask = self.vowel[a]
        a[mask] = a[mask][::-1]
        return b.decode('ascii')
    vowel = np.array([chr(i) in 'aeiouAEIOU' for i in range(128)])

    
    def Jan_B(self, s: str) -> str:
        # set instead of list with both lower and upper case to omit .lower()
        vowels = {'a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U'}

        # list comprehension instead of loop
        v = [c for c in s if c in vowels]

        # reverse function instead of slicing
        v.reverse()

        # adding to string instead of converting to and from list
        result = ''

        # looping directly over the string
        for char in s:
            if char in vowels:
                char = v.pop(0)
            result += char

        return result


    def Schmuddi(self, s: str) -> str:
 
        # create a set of all potential vowel characters
        vowels = set("aeiouAEIOU")

        # create a generator that will yield the vowels in reverse for each
        # time it's processed:
        backward_vowels = (ch for ch in s[::-1] if ch in vowels)

        # generate a list of characters where each character is either the
        # unaltered character from the original string, or the next character
        # yielded by the generator of reversed vowels (note that the list is
        # returned as a string):
        return "".join(
            [ch if ch not in vowels else next(backward_vowels) 
             for ch in s
            ])

Summarizer script (with fewer results):

from collections import defaultdict

results = '''
 77.7 ms  Jan_B
 13.0 ms  MatBailie
 76.8 ms  Jan_B
 12.5 ms  MatBailie
 78.7 ms  Jan_B
 12.8 ms  MatBailie
'''

times = defaultdict(list)
for line in results.splitlines():
    if line:
        times[line[10:]].append(line[:5])

for name, ts in times.items():
    print(*ts, 'ms ', name)

Attempt This Online!

\$\endgroup\$
4
  • \$\begingroup\$ @Schmuddi Yeah I focused on the LeetCode tests only. Among the many alternatives I tried was one where I precompile the regex, it was only slightly faster in my benchmark but would likely help more in yours. I also tried one almost identical to yours, but it took around 8.2 ms. Hmm, you got 3 ns per loop? That's suspiciously fast, almost nothing is that fast in Python. Could you share your benchmark script? \$\endgroup\$
    – no comment
    Commented Jul 5 at 2:26
  • \$\begingroup\$ @Schmuddi I don't think it caches. And even that would be suspiciously fast. Did you copy&paste the times? Or did you type "ns" when it was actually "μs"? \$\endgroup\$
    – no comment
    Commented Jul 5 at 6:49
  • \$\begingroup\$ @Schmuddi Ah, so you used the number from the mean and the unit from the stddev? Why is it 1.36 now, much faster than before? If your times vary that much, you can't really say which one is faster. And in my own testing, I consistently get times around 2.0 μs with mine, 2.3 μs with yours, and 1.6 μs with mine using prepared splitter: sample results and code \$\endgroup\$
    – no comment
    Commented 2 days ago
  • \$\begingroup\$ Sorry, I just realized that I was extremely stupid and not really paying attention to what I was doing. Please ignore all of my comments about timing, and apologies for wasting your time... \$\endgroup\$
    – Schmuddi
    Commented 2 days ago
2
\$\begingroup\$

The logic behind your function is this:

  1. Create a list of the vowels in the target string
  2. Reverse this list
  3. Iterate through the indices pointing to each character of the target string
  4. For each index, check if the indexed character is a vowel
  5. If yes, retrieve the next vowel from the list of reversed vowel, and replace the indexed character by that retrieved vowel
  6. Otherwise, don't change the indexed character

This logic isn't bad, but your implementation is not the most efficient one. Here are suggestions that improve the performance:

  • Avoid for loops, and use list comprehensions and generators instead
  • Iterate through the string directly instead of using numeric indices
  • Avoid using functions inside of loops, list comprehensions, and generators
  • Use sets to look up characters instead of lists
  • Instead of reversing lists, work on reversed strings instead

Here is my version of the task that makes use of these suggestions. Note that it still follows the same logic as your own solution, but is quite performant. For short to medium strings, it can compete with the bonus answer provided by @no_comment.

def revowel(s: str) -> str:

    # create a set of all potential vowel characters
    vowels = set("aeiouAEIOU")

    # create a generator that will yield the vowels in reverse for each
    # time it's processed:
    backward_vowels = (ch for ch in s[::-1] if ch in vowels)

    # generate a list of characters where each character is either the
    # unaltered character from the original string, or the next character
    # yielded by the generator of reversed vowels (note that the list is
    # returned as a string):
    return "".join(
        [ch if ch not in vowels else next(backward_vowels) 
         for ch in s
        ])
\$\endgroup\$
2
\$\begingroup\$

All good things have already been said, alas.

Perhaps a solution using numpy, where you can use arrays to select members of arrays.

import numpy as np

def reverse_vowels(s: str) -> str:
    ss = np.array(list(s))
    vowels = set("aeiouAEIOU")
    
    vowel_mask = np.vectorize(lambda c: c in vowels)(ss)

    ss[vowel_mask] = ss[vowel_mask][::-1]
    
    return "".join(ss)

Perhaps a solution using iterators, but it's essentially a slightly more general version of Schmuddi's answer.

from itertools import filterfalse

def intersperse_by(key_iterator, iterators):
    for key in key_iterator:
        yield next(iterators[key])

def reverse_vowels(s: str) -> str:
    vowels = set("aeiouAEIOU")
    
    def is_vowel(c): return c in vowels
    
    return "".join(
                intersperse_by(
                    map(is_vowel, s), [
                        filterfalse(is_vowel, s),
                        filter(is_vowel, reversed(s)),
                    ]
                )
            )

Reviewing my own code, perhaps it could be useful for someone else. I've looked at runtimes for strings of lengths n=10**k, both of my approaches are linear in n. Which is both expected and good. Below, I'm using the runtimes encountered for a random string of length 1_000_000.

no_comment's regexp based solution takes 75 ms. My numpy based solution takes 796 ms, the iterator-based one 452 ms. I don't expect to be able to beat the regexp-based answer, but surely there's room for improvement.


Defining a function, and calling it is costly. Indeed,

vowels = set("aeiouAEIOU")

def is_vowel(c): return c in vowels

could have been

is_vowel = set("aeiouAEIOU").__contains__

and this simple change cuts the runtime of the iterator-based answer from 452 ms to 257 ms.


Numpy is great, but not for string processing. Duh. Even the simple act of converting the string to an array and back to the string,

"".join(np.array(list(s)))

takes 520 ms, the somewhat better line,

"".join(np.fromiter(s, dtype='<U1'))

takes 437 ms.

Picking more numpy-friendly datatypes,

"".join(map(chr, np.fromiter(map(ord, s), dtype=np.uint8)))

we can reduce this to 195 ms.

which is better, but still won't win us any awards. Nonetheless, using this idea gets the runtime down from 795 ms to 275 ms.


The improved codes:

def reverse_vowels(s: str) -> str:
    vowel_mask = np.fromiter(map(set("aeiouAEIOU").__contains__, s), np.bool)
    ss = np.fromiter(map(ord, s), dtype=np.uint8)

    ss[vowel_mask] = ss[vowel_mask][::-1]
    return "".join(map(chr, ss))

def reverse_vowels(s: str) -> str:
    is_vowel = set("aeiouAEIOU").__contains__

    vowel_mask = map(is_vowel, s)
    consonants = filterfalse(is_vowel, s)
    vowels_reversed = filter(is_vowel, reversed(s))
    
    return "".join(intersperse_by(vowel_mask, (consonants, vowels_reversed)))
\$\endgroup\$
6
  • \$\begingroup\$ If you assume ASCII like it is on LeetCode, you could encode the string to bytes and create a NumPy array from that. That should be fast. \$\endgroup\$
    – no comment
    Commented Jul 5 at 18:39
  • \$\begingroup\$ And for the mask, it might be faster to not use a set but find each vowel's locations separately and "or" them. \$\endgroup\$
    – no comment
    Commented Jul 5 at 18:46
  • \$\begingroup\$ Or use a lookup-array with 128 bools telling whether the characters are a vowel. \$\endgroup\$
    – no comment
    Commented Jul 5 at 18:49
  • 1
    \$\begingroup\$ Try this. \$\endgroup\$
    – no comment
    Commented Jul 5 at 19:12
  • 1
    \$\begingroup\$ @nocomment That's smart. Great! Yep, it's by far the fastest (~16 ms in the context of my answer). \$\endgroup\$ Commented Jul 5 at 23:29
1
\$\begingroup\$

I instantly thought about looping the characters in the string directly instead of looping by the index and also wanted to test just adding to a string instead of the conversion to and from a list. Reading the other answers I also implemented a set instead of a list containing both upper and lower case vowels. I'm not 100% sure if a list comprehension is always faster than a for loop but it at least looks cleaner to me.

So here is my approach:

def reverse_vowels_fast(s: str) -> str:
    # set instead of list with both lower and upper case to omit .lower()
    vowels = {'a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U'}

    # list comprehension instead of loop
    v = [c for c in s if c in vowels]

    # reverse function instead of slicing
    v.reverse()

    # adding to string instead of converting to and from list
    result = ''

    # looping directly over the string
    for char in s:
        if char in vowels:
            char = v.pop(0)
        result += char

    return result

I timed it in comparison to the solutions of @MatBailie and @no comment with test_str = 'A slightly longer string to process to see the difference in time better. But only tested manually.'

Function: original_reverse_vowels, execution time: 49700ns
---------
Function: reverse_vowels_no_comment, execution time: 155100ns
Function: reverse_vowels_mat_bailie, execution time: 18100ns
Function: reverse_vowels_fast, execution time: 14400ns
\$\endgroup\$
6
  • 1
    \$\begingroup\$ Building the result string like that can take quadratic time. It depends on implementation details. Not a good idea, and certainly shouldn't be called "fast". Likely they did this at LeetCode, which allows the strings to have up to 300,000 characters. What times do you get for that? Measuring with a worst case string like my example string would also be good. \$\endgroup\$
    – no comment
    Commented Jul 4 at 11:39
  • 4
    \$\begingroup\$ The natural place to pop from a list is the end of it, not the front. .pop(0) will move all the items in the list one position to the left, so it's a slow (O(n)) operation. You can skip the reverse call altogether and simply .pop() (which defaults to the last item) to reverse without reversing. Also, adding to a string is also not a good idea: strings are immutable, so result += char creates a new string and copies the contents of result and char to it, before pointing result to the new string. Appending to a list, then doing ''.join(result_list) is the recommended way to go. \$\endgroup\$
    – Jaime
    Commented Jul 4 at 11:46
  • \$\begingroup\$ @nocomment and Jaime thanks for your input! As written I only "tested" my version with the given string. I am no developer by trade and my projects do almost never need speed optimization so I have not much experience with that. I value your insights as I learned a lot from it! \$\endgroup\$
    – Jan_B
    Commented Jul 5 at 14:08
  • \$\begingroup\$ Btw which of my solutions did you time? \$\endgroup\$
    – no comment
    Commented Jul 5 at 16:27
  • \$\begingroup\$ @nocomment the regex one \$\endgroup\$
    – Jan_B
    Commented 2 days ago
-1
\$\begingroup\$

The biggest problem with the code presented is that it does not document what it is to accomplish:
See how your take and MatBailie's do a substitution based on the position in a/the sequence of vowels where others substitute based on the order of vowels in the string.

In addition to using a set to test case-normalised characters, one could use a set containing each vowel in lower as well as upper case.

One could establish the set of vowels used, but the time save in "the substitution pass" is probably less than what's added in preprocessing.

Alternatives: Iterate from both sides concurrently exchanging vowels "on the fly".
When you reach the middle vowel, you're done -
problem: establishing you've reached the middle vowel.

Establish sequence of vowels and its reverse,
find and substitute each.

\$\endgroup\$
4
  • 3
    \$\begingroup\$ Do you have an example of how you'd manipulate the string, being that strings are immutable on Python? \$\endgroup\$
    – MatBailie
    Commented Jul 4 at 9:50
  • \$\begingroup\$ (@MatBailie: think Python…. But then, search may be faster on strings, so locate in the string, manipulate list.) \$\endgroup\$
    – greybeard
    Commented Jul 4 at 9:56
  • 3
    \$\begingroup\$ Please show code to demonstrate your suggested substitutions. \$\endgroup\$
    – MatBailie
    Commented Jul 4 at 10:32
  • \$\begingroup\$ @MatBailie The code for search in string, manipulate list would culminate in something close to no comment's RE suggestion/solution; similarly, at least some of the other approaches are mentioned or spelled out in other answers. When I feel like it, I may try and figure out which where there before I mentioned them and remove the respective sentence from my answer. \$\endgroup\$
    – greybeard
    Commented Jul 5 at 10:19