Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(plugins/http-log): improve concurrency when max_batch_size is set to 1 #13332

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

vm-001
Copy link
Member

@vm-001 vm-001 commented Jul 4, 2024

Summary

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

FTI-6022

@vm-001 vm-001 requested a review from hanshuebner July 4, 2024 05:39
@github-actions github-actions bot added plugins/http-log cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee labels Jul 4, 2024
@vm-001 vm-001 marked this pull request as draft July 4, 2024 05:40

if (ngx.now() - start_time) > queue_conf.max_retry_time then
kong.log.err(string.format(
"could not send entries, giving up after %d retries. 1 queue entries were lost",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the maximum retry time is configured, it should probably be reported instead of in addition to the number of retries so that it is clear what knob to turn in case this message ist shown.

Copy link
Member Author

@vm-001 vm-001 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing log like this make sense to you?

kong.log.err(string.format(
            "could not send entries, giving up after %d retries. 1 queue entries were lost。 Please consider tune the configuration max_retry_time",
            retry_count))
          break
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update the log message to indicate the max_retry_time is exceeded

kong/plugins/http-log/handler.lua Outdated Show resolved Hide resolved
kong/plugins/http-log/handler.lua Outdated Show resolved Hide resolved
kong/plugins/http-log/handler.lua Outdated Show resolved Hide resolved
changelog/unreleased/kong/http-log-Improvement.yml Outdated Show resolved Hide resolved
vm-001 and others added 3 commits July 4, 2024 14:50
@hanshuebner
Copy link
Contributor

I think the new behavior should be enabled by an explicit configuration parameter (no_queue) rather than implicit by a certain queue length. This would make it easier to understand.

@vm-001 vm-001 marked this pull request as ready for review July 4, 2024 07:30
@vm-001 vm-001 requested a review from hanshuebner July 4, 2024 08:18
spec/01-unit/27-queue_spec.lua Outdated Show resolved Hide resolved
kong/plugins/http-log/handler.lua Outdated Show resolved Hide resolved
spec/03-plugins/03-http-log/01-log_spec.lua Outdated Show resolved Hide resolved
spec/03-plugins/03-http-log/01-log_spec.lua Outdated Show resolved Hide resolved
vm-001 and others added 3 commits July 5, 2024 14:31
Co-authored-by: Hans Hübner <hans.huebner@gmail.com>
Co-authored-by: Hans Hübner <hans.huebner@gmail.com>
@@ -0,0 +1,4 @@
message: |
**HTTP-Log**: When `queue.max_batch_size` is 1, log entries are now sent in separate, parallel HTTP requests. Previously, they were sent sequentially in FIFO order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**HTTP-Log**: When `queue.max_batch_size` is 1, log entries are now sent in separate, parallel HTTP requests. Previously, they were sent sequentially in FIFO order.
**HTTP-Log**: When `queue.max_batch_size` is explicitly set to `1`, log entries are now sent via concurrent HTTP requests. Previously, they were sent sequentially in FIFO order.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of max_batch_size is 1. So words explicitly set is not applicable at this case.

end

if (now() - start_time) > queue_conf.max_retry_time then
kong.log.err("could not send entries due to max_retry_time exceeded. 1 queue entry was lost")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kong.log.err("could not send entries due to max_retry_time exceeded. 1 queue entry was lost")
kong.log.err("could not send this log entry due to max_retry_time exceeded: lost")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now reuse the Queue.handle logic, so the suggestion here is not applicable.

local entry = cjson.encode(kong.log.serialize())

if queue_conf.max_batch_size == 1 then
local ok ,err = timer_at(0, function()
Copy link
Contributor

@outsinre outsinre Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the TPS is very high, the number of timers would soar. Would that be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose all the timers would be managed by timer-ng since 3.0. In that case, there will be warning logs from the timer-ng.

local entry = cjson.encode(kong.log.serialize())

if queue_conf.max_batch_size == 1 then
local ok ,err = timer_at(0, function()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we extract the timer's logic in a local function? Also - this looks very similar to the process_once method of the queue. Perhaps we could expose that logic from there and reuse it, to reduce duplication.

Copy link
Member Author

@vm-001 vm-001 Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of function reuse. So the Queue has to expose a function(e.g. Queue:handle) to be used internally and externally.

Here is the function reuse approach PR #13384. However, we have to refactor the Queue a little to expose

  • Queue.create(): create a Queue instance without starting a background timer, this helps us to leverage the OOP of the queue. (most of the queue's functions are OOP, such as queue:log(), it's hard to reuse these without create a Queue instance : (
  • Queue:handle(): a function contains retry logic

@samugi @hanshuebner Let me know your thoughts. The function reuse approach has a bit wider code change and is a bit more risky, hence requires more time to do a thoughtful review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a review comment, I hope my observation above didn't cost you too much time and efforts 😸

@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 17, 2024
@@ -189,12 +190,27 @@ function HttpLogHandler:log(conf)

local queue_conf = Queue.get_plugin_params("http-log", conf, make_queue_name(conf))
kong.log.debug("Queue name automatically configured based on configuration parameters to: ", queue_conf.name)
local entry = cjson.encode(kong.log.serialize())

if queue_conf.max_batch_size == 1 then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment by Hans:

I think the new behavior should be enabled by an explicit configuration parameter (no_queue) rather than implicit by a certain queue length. This would make it easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about it. Move to internal discussion and the potential backport.

if premature then
return
end
queue:handle({ entry })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with this approach is that we'd be using the Queue to process entries in parallel, i.e. not what a queue normally does. To properly reuse the code (now in the queue) that processes an entry we should probably extract it from the Queue module and then refer to it both from the queue and from here.

If this is too complex for the scope of this task we might want to do it as a separate refactor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we extract the retry logic ourside of Queue in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee plugins/http-log size/L
4 participants