This query selects the number of SO questions, the number of views and the number of unanswered questions for each tag (the list of the tags is the user input). It works fine when it works, but it times out with the error message Line 0: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
if I add gnuplot
to the list.
Is there a way to optimise it to avoid getting an error?
The line AND Posts.Tags LIKE '%<%'
should not make any difference, but it seems to increase the chances that the query returns the result (maybe it's just an accident?)
Adding the execution plan screenshots below.
Bonus question: is there a better way to use STRING_SPLIT
to create the auxiliary table?
-- INPUT EXAMPLE: pandas-datareader,google-finance-api,yahoo-finance,alpha-vantage,ta-lib,yfinance,google-finance
-- If the query times out, run it in batches, two or three tags at time
CREATE TABLE #KeyTags (
key_word VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CS_AS);
GO
INSERT INTO #KeyTags (key_word)
SELECT * FROM STRING_SPLIT(##CommaSeparatedOptions:string##, ',');
GO
SELECT #KeyTags.key_word AS key_word,
SUM(CAST(ViewCount AS BIGINT))AS viewed,
COUNT(Posts.ViewCount) AS question,
SUM(CASE WHEN ((Posts.AnswerCount < 1) AND (Posts.ClosedDate IS NULL)) THEN 1 ELSE 0 END)
AS unanswered_question
FROM Posts JOIN #KeyTags ON Posts.Tags LIKE CONCAT('%<',#KeyTags.key_word,'>%')
WHERE Posts.PostTypeId = 1
AND Posts.Tags LIKE '%<%'
GROUP BY key_word
ORDER BY viewed DESC;
bonus question
should be asked separately \$\endgroup\$