2
\$\begingroup\$

I'm exceeding the time limit for a 10,000 word test case provided on LeetCode:

Given an array of strings, group anagrams together.

For example, given: ["eat", "tea", "tan", "ate", "nat", "bat"],
Return:

[
  ["ate", "eat","tea"],
  ["nat","tan"],
  ["bat"]
]

Note: All inputs will be in lower-case.

Any suggestions on how I could speed this up a little bit?

public HashMap<String, List<Integer>> createMap(String[] strs)
{
  HashMap<String, List<Integer>> map = new HashMap<String, List<Integer>>();

    for(String s: strs)
    {
     char[] charArray = s.toCharArray();
     Arrays.sort(charArray);
     String sortedString = new String(charArray);

     if(map.containsKey(sortedString))
     {
        List<Integer> pos = map.get(sortedString);
        pos.add(Arrays.asList(strs).indexOf(s));
        map.put(sortedString, pos);
     }

    else
    {
        List<Integer> pos = new ArrayList<Integer>();
        pos.add(Arrays.asList(strs).indexOf(s));
        map.put(sortedString, pos);
    }
  }

  return map;
}

public List<List<String>> groupAnagrams(String[] strs) {

  HashMap hm = createMap(strs);

  List<List<String>> anagrams = new ArrayList<List<String>>();

  for(Object k: hm.keySet())
  {
    String key = k.toString();
    ArrayList pos = (ArrayList)hm.get(key);

    List<String> a = new ArrayList<String>();
    for(Object p : pos)
    {
      int idx = Integer.parseInt(p.toString());
      a.add(strs[idx]);
    }
    anagrams.add(a);
  }
  return anagrams;
}
\$\endgroup\$
1
  • 4
    \$\begingroup\$ Please include the description of the programming-challenge into the question so the reviewers can see what you're trying to do and why. \$\endgroup\$
    – Mast
    Commented Feb 4, 2017 at 8:24

3 Answers 3

7
\$\begingroup\$

Why so slow

It's important to consider the time complexity of all the operations in your program.

Take a closer looks at this part I extracted from your code:

for(String s: strs)
{
 // ...

    pos.add(Arrays.asList(strs).indexOf(s));

For each s, converting a String[] to a List<String>, in order to use the indexOf method to find the index of s? That doesn't sound so good. How about making that a counting loop?

for (int i = 0; i < strs.length; i++) {
    // ...

    pos.add(i);

But why use indexes at all? Why not make this method return a Map<String, List<String>> instead?

for (String s : strs) {
    // ...

    pos.add(s);

If you do that, then the implementation of the other method becomes simply:

public List<List<String>> groupAnagrams(String[] strs) {
    return new ArrayList<>(createMap(strs).values());
}

Good practices

The code violates many good practices and common conventions:

  • Never use raw types, for example use HashMap<String, List<String>> instead of just HashMap without type parameters
  • Prefer to declare variables with interface types, for example Map<String, List<String>> instead of HashMap<String, List<String>>
  • Use the diamond operator <> when possible, for example new HashMap<> instead of new HashMap<String, List<String>>
  • Use consistent indentation, and please place braces they way I did in my examples above
\$\endgroup\$
2
\$\begingroup\$

Jianmin commented on my answer to a similar question here: Grouping anagrams and it got me thinking about the problem, and also reading this solution.

Janos has some good comments, but I wanted to point out three additional things:

computeIfAbsent

Java Map instances now have the computeIfAbsent function. Code like:

 if(map.containsKey(sortedString))
 {
    List<Integer> pos = map.get(sortedString);
    pos.add(Arrays.asList(strs).indexOf(s));
    map.put(sortedString, pos);
 }

else
{
    List<Integer> pos = new ArrayList<Integer>();
    pos.add(Arrays.asList(strs).indexOf(s));
    map.put(sortedString, pos);
}

should instead be:

List<Integer> pos = map.computeIfAbsent(sortedString, k -> new ArrayList<>());
pos.add(Arrays.asList(strs).indexOf(s));

Of course, you should be using String values and not Integers, as Janos indicated, but the idea is to only have one call in to the Map. See the docs here: computeIfAbsent

function extraction

Your code to compute the "key" from the word, should be extracted in to a separate function. There should be a function:

public static final String keyOf(String word) {
    char[] chars = word.toCharArray();
    Arrays.sort(chars);
    return new String(chars);
}

It clearly is separate, isolated logic, and should be maintained as such.

Streams

Streams have been around for a while, and you should become familiar with them.

I tool the liberty of "solving" the problem using streams, and the code is relatively neat, and concise:

public static final List<List<String>> groupAnagrams(Stream<String> stream) {
    Map<String, List<String>> mapped = stream.collect(Collectors.groupingBy(w -> keyOf(w)));
    return new ArrayList<>(mapped.values());
}

I have put that in to an ideone page here: https://ideone.com/qLiA8c (that ideone code also has code to format it "pretty" ... like the examples above).

\$\endgroup\$
1
  • \$\begingroup\$ These are all very good and helpful points. I'm not so familiar with Streams and Generics functions, but I will incorporate these changes. Thank you very much! \$\endgroup\$ Commented Feb 5, 2017 at 22:21
0
\$\begingroup\$

Make Java code more readable, ready to review

I strongly agree with code review conducted by Janos, code review is not just to share you a workable solution. You also have to learn ways to make code more readable, practice better way using Java language in your case.

Use O(N) solution to generate key, avoid \$O(NLogN)\$ Sorting algorithm

The idea to avoid timeout issue is to generate a key for each anagram group using \$O(N)\$ time complexity, instead of naive one by sorting the string. To take advantage of alphabetic number only has constant of size \$26\$, go through the string once, one char a time, to record the number of occurrence, like a counting sort. Here is the C# code, pass all test cases on leetcode online judge.

Here is the anagram hashed key algorithm in C#, most of important decision is to choose a more efficient sort - counting sort instead of comparison based sorting:

    /* O(n) solution 
     * abc 
     * hashed to key:
     * 01-11-11
     * 0 stands for a, 1 is count of a
     * further simplify the key:
     * 1-1-1
     * first 1 is count of a, 
     * second 1 is count of b, 
     * third 1 is count of c
     * 
     */
    public static string ConvertHashKeys(string input)
    {
        if (input == null || input.Length == 0)
        {
            return string.Empty;
        }

        int[] countAlphabetic = new int[26];

        foreach (char c in input)
        {
            countAlphabetic[c - 'a']++;
        }

        return String.Join("-", countAlphabetic);
    }
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.