[No JIRA] Fix inefficient iterators #250

TomMD · 2020-08-18T00:07:44Z

Iterating over a map then looking up a key guarantees 2x the hash map lookups than what is necessary. See here for a more detailed writeup.

This instance was found using the Muse analysis platform (which I work on), specifically with the Infer tool. It appears Apache is pro-CI and the analysis worked out of the box and without configuration. Perhaps we should enable it to run on all pull requests, meaning new bugs would receive comments (example).

I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004

Iterating over a map then looking up a key guarentees 2x the hashmap lookups than what is necessary.

bdemers · 2020-08-18T14:49:20Z

Thanks @TomMD!

I'd be interested in hearing more. Especially the rate of false positives (and how easy it is to mark them as such).
For example, I really like the OWASP Dependency Check plugin, but it has a high rate of false positives (and it's a PITA to work around them, i.e. hacking up an XML file). Same thing with SpotBugs (though, that has a low rate of false positives)

TomMD · 2020-08-18T15:58:34Z

@bdemers The goal of Muse is to be a platform that is easy for tool authors and developers alike. For this repo it ran infer, errorprone (only the errors, not the warnings or lesser messages), and findsecbugs. There is a tool API so if the default set isn't large enough you can bring your own tool (or just tell me what you'd like to see and I'll try to get to it).

Workflow wise, each of these checks are ran on PRs and only new bugs get posted as comments - so anything pre-existing (such as prior false positives) won't disrupt the dev workflow. Empirically, it's pretty quiet (that's by design) and comments are easy to ignore/resolve regardless of being true or false positives.

The console (details link from github) is available with the full gory detail of each bug from each tool, but if they weren't commented to github then they aren't relevant to the current PR.

You can make configurations to further reduce false positives. If you are familiar with a tool's configuration (ex: infer's .inferconfig file) then that is absolutely the right place to make tool-specific adjustments. Alternatively, you can globally set which issues are ignored with a .muse/config.toml filtering out the loudest issues or even disabling the noisy tools.

The false positive rate varies a lot with the code base. Looking at the results on Shiro things appear to be decent especially if we mute the "missing override" warning from Errorprone (.muse/config.toml of ignore = ["MissingOverride"]).

fpapon · 2020-08-18T16:01:41Z

@TomMD what is the difference with a tool like Sonarqube?

TomMD · 2020-08-18T16:12:26Z

@fpapon Sonarqube as in the open source Java checker tool is shallower - it runs file-by-file and reports linter-on-steroids results and usually pretty verbosely. Compare, say, with Infer which is heavier (requires compilation to run) but can detect bugs that are inter procedural and cross compilation units (in different files). Compared with ErrorProne I found SQ's results less often actionable - when Google talks about reducing false positives in errorprone they are quite serious. More-over, if you like the results then the tool could be called as one of the many from within Muse (and I'd be interested in helping and understanding your preferred experience).

Sonarqube as in the platform integrates quite differently (or, the trials I've experienced are different). It expects to be used in batch and the center of the dev's attention. Muse tries to stay out of the way and inject the relevant issues as comments, not assume or demand anyone visit yet another site during code review.

fpapon · 2020-08-18T16:31:48Z

@TomMD ok, thanks for the explanations.

bmarwell · 2020-08-18T17:47:39Z

Interesting! Sounds like yet another "best of compilation" of many code checking tools. I really like tools which can be part of a fast commit hook and/or part of the validate and verify phases.
Ideally, code with bad habits should not be merged in the first place.

In any case, I'd like to see a jira issue for this, like "enabling muse code checking" or similar.

I have no strong opinion about which tool to use. As long as it fits our needs, why not?

bdemers · 2020-08-18T17:54:44Z

I'm up for trying it out. One concern I have is being able to run the scan locally. For example, I use SpotBugs (and FindSecBugs) on a bunch of projects and being able to run the scan locally (to confirm the issues are fixed is great)
And of course, IDE warnings can be configured before you get that far

fpapon · 2020-08-18T18:30:05Z

@TomMD Is it free for OSS projects?

TomMD · 2020-08-18T18:46:50Z

@fpapon It is free now and always for open source projects. We are pre-sales right now so it is free to private repositories for just a bit longer too, as is the on premise version (though you need to talk to us for that one).

@bdemers I hear that. We're small and honestly still not solidly sure on how best to get developers the tools on their own system. One solution we have beta for is a command line tool that sends code to a server which runs the analysis and hosts the results (indexed by a job ID). Many of the tools aren't so quick as to be editor friendly so an LSP integration might not happen soon, but there are LSP-like ideas out there in support of jobs with longer cycles such as analysis passes.

TomMD · 2020-08-20T17:13:14Z

@bmhm Would you like me to open a Jira ticket then?

Fix inefficient iterators

638e639

Iterating over a map then looking up a key guarentees 2x the hashmap lookups than what is necessary.

bdemers merged commit f819f32 into apache:master Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[No JIRA] Fix inefficient iterators #250

[No JIRA] Fix inefficient iterators #250

TomMD commented Aug 18, 2020

bdemers commented Aug 18, 2020

TomMD commented Aug 18, 2020

fpapon commented Aug 18, 2020

TomMD commented Aug 18, 2020

fpapon commented Aug 18, 2020

bmarwell commented Aug 18, 2020

bdemers commented Aug 18, 2020

fpapon commented Aug 18, 2020

TomMD commented Aug 18, 2020

TomMD commented Aug 20, 2020

[No JIRA] Fix inefficient iterators #250

[No JIRA] Fix inefficient iterators #250

Conversation

TomMD commented Aug 18, 2020

bdemers commented Aug 18, 2020

TomMD commented Aug 18, 2020

fpapon commented Aug 18, 2020

TomMD commented Aug 18, 2020

fpapon commented Aug 18, 2020

bmarwell commented Aug 18, 2020

bdemers commented Aug 18, 2020

fpapon commented Aug 18, 2020

TomMD commented Aug 18, 2020

TomMD commented Aug 20, 2020