Skip to main content

All Questions

Tagged with
0 votes
1 answer
22 views

Accuracy_score with same value in different classifiers methods

I'm doing a project, on Google Colab, for fake news classification with LIAR dataset. I am running with three differents features extractors (TF-IDF, DistilBERT and LLAMA 2) and seven classifiers (...
lucasa.lisboa's user avatar
0 votes
1 answer
45 views

Benepar for syntactic segmentation

I want to use Benepar with a French model to do a syntactic segmentation. I followed the tutorial but I have always have this error RuntimeError: Error(s) in loading state_dict for ChartParser: ...
nassima.crt's user avatar
1 vote
0 answers
48 views

Chunking text with bounding box values

I have used the Azure OCR service to extract text from PDFs. For each page in a PDF, the OCR output contains a list of text lines along with the bounding box values for that line. My original approach ...
AnonymousMe's user avatar
0 votes
1 answer
34 views

in R converting a text file into a data frame

in R have a .txt file that i would like to extract data from as a character string. my .txt file is formatted like the following with a list separated by numbers. 1. [text1] 2. [text2] 3. [text3] and ...
sebastian.mendoza's user avatar
0 votes
0 answers
60 views

Trying to cluster short survey answers (1 to 10 words). Am I on the right track?

Here's the explanation of what i want to fully make (its a project for school). A user just puts in a file with just the answers to whatever question was asked in the survey. 2.The machine finds ...
Shimz's user avatar
  • 1
0 votes
1 answer
535 views

Langchain sql agent with context

I am working on a langchain based SQL chat application and wanted my agent to understand context w.r.t the user session. For e.g. User - What is highest order placed in last placed? Bot - Order id : ...
matvi's user avatar
  • 13
1 vote
1 answer
137 views

ValueError: Cannot use a compiled regex as replacement pattern with regex=False

I'm doing a project, on Google Colab, where I use the following version: !pip install "gensim==4.2.0" !pip install "texthero==1.0.5" Until recently, I received the following ...
lucasa.lisboa's user avatar
0 votes
0 answers
24 views

SVM algorithm training fitting doesnt work for text classification

I'm trying to fit the sentiment5 data which contains 2 varibales "tweet" that has vectorized text data (using TF-IDF) and "target" that has 1 and 0 for positive and negative. I ...
Arcane Persona's user avatar
0 votes
0 answers
40 views

latex/mathematical text cleaning / mwparser

I am looking to build a data science focused search engine and had a question for those familiar with parsing text with mathematical notation. So I have set up a standard WikiAPI class with a method ...
goofy-data-scientist's user avatar
0 votes
1 answer
79 views

How to decide correct NLP approach for a project

I'm working on an NLP project. My task is to determine the category and Sentiment Score of Turkish Call Center conversations from the conversations themselves. We are using Python as our programming ...
Bilal Sedef's user avatar
1 vote
0 answers
52 views

I have Dataframe Spark and I want to generate Ngrams but the way gensim bigram model does it

I have a text dataframe (tweets), I am using Spark for high volume data handling and I want to generate Bigrams in the same way as Gensim bigrams models do. I have been using Spark NLP for ...
Criscas05's user avatar
0 votes
0 answers
114 views

How to segment text in PDF files to get out some headings

Lets say that i have a couple hundred of PDF file from which i have to extract each heading and the relevant text, for further processing for each heading how do I do that keeping the format of the ...
USMAN SIDDIQUI's user avatar
0 votes
0 answers
33 views

Faster approach to collect text data from multiple URL and save it to the dataframe rowwise for each URL

I have a DataFrame of shape (700000,5). One column of the DataFrame has single unique text file URL Example: Two columns showing for reference: Task identifier Text url ub12345567 https:/ / someadd....
Remrem's user avatar
  • 25
0 votes
0 answers
37 views

How can I identify the number of occurrences of multiple custom emotions, grouped by line, team, and personal ID?

I have a data frame like the following (but much larger and with repeated observations across time): df <- data.frame( participant_ID = 1:4, TeamID = c("A", "A", "B", ...
user22571454's user avatar

15 30 50 per page
1
2 3 4 5
35