All Questions
524
questions
0
votes
1
answer
22
views
Accuracy_score with same value in different classifiers methods
I'm doing a project, on Google Colab, for fake news classification with LIAR dataset. I am running with three differents features extractors (TF-IDF, DistilBERT and LLAMA 2) and seven classifiers (...
0
votes
1
answer
45
views
Benepar for syntactic segmentation
I want to use Benepar with a French model to do a syntactic segmentation.
I followed the tutorial but I have always have this error
RuntimeError: Error(s) in loading state_dict for ChartParser:
...
1
vote
0
answers
48
views
Chunking text with bounding box values
I have used the Azure OCR service to extract text from PDFs. For each page in a PDF, the OCR output contains a list of text lines along with the bounding box values for that line. My original approach ...
0
votes
1
answer
34
views
in R converting a text file into a data frame
in R have a .txt file that i would like to extract data from as a character string. my .txt file is formatted like the following with a list separated by numbers. 1. [text1] 2. [text2] 3. [text3] and ...
0
votes
0
answers
60
views
Trying to cluster short survey answers (1 to 10 words). Am I on the right track?
Here's the explanation of what i want to fully make (its a project for school).
A user just puts in a file with just the answers to whatever question was asked in the survey.
2.The machine finds ...
0
votes
1
answer
535
views
Langchain sql agent with context
I am working on a langchain based SQL chat application and wanted my agent to understand context w.r.t the user session. For e.g.
User - What is highest order placed in last placed?
Bot - Order id : ...
1
vote
1
answer
137
views
ValueError: Cannot use a compiled regex as replacement pattern with regex=False
I'm doing a project, on Google Colab, where I use the following version:
!pip install "gensim==4.2.0" !pip install "texthero==1.0.5"
Until recently, I received the following ...
0
votes
0
answers
24
views
SVM algorithm training fitting doesnt work for text classification
I'm trying to fit the sentiment5 data which contains 2 varibales
"tweet" that has vectorized text data (using TF-IDF) and
"target" that has 1 and 0 for positive and negative.
I ...
0
votes
0
answers
40
views
latex/mathematical text cleaning / mwparser
I am looking to build a data science focused search engine and had a question for those familiar with parsing text with mathematical notation. So I have set up a standard WikiAPI class with a method ...
0
votes
1
answer
79
views
How to decide correct NLP approach for a project
I'm working on an NLP project. My task is to determine the category and Sentiment Score of Turkish Call Center conversations from the conversations themselves. We are using Python as our programming ...
1
vote
0
answers
52
views
I have Dataframe Spark and I want to generate Ngrams but the way gensim bigram model does it
I have a text dataframe (tweets), I am using Spark for high volume data handling and I want to generate Bigrams in the same way as Gensim bigrams models do. I have been using Spark NLP for ...
0
votes
0
answers
114
views
How to segment text in PDF files to get out some headings
Lets say that i have a couple hundred of PDF file from which i have to extract each heading and the relevant text, for further processing for each heading how do I do that keeping the format of the ...
0
votes
0
answers
33
views
Faster approach to collect text data from multiple URL and save it to the dataframe rowwise for each URL
I have a DataFrame of shape (700000,5). One column of the DataFrame has single unique text file URL
Example:
Two columns showing for reference:
Task identifier
Text url
ub12345567
https:/ / someadd....
0
votes
0
answers
37
views
How can I identify the number of occurrences of multiple custom emotions, grouped by line, team, and personal ID?
I have a data frame like the following (but much larger and with repeated observations across time):
df <- data.frame(
participant_ID = 1:4,
TeamID = c("A", "A", "B", ...