pandas.parser.CParserError: Error tokenizing data

Question

I'm trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

I have tried to read the pandas docs, but found nothing.

My code is simple:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

How can I resolve this? Should I use the csv module or another language?

If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a '\r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error. — Sam Weisenthal, Commented Jan 23, 2017 at 0:56
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue. — gilgamash, Commented May 23, 2018 at 12:30
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines. — tsveti_iko, Commented Aug 22, 2018 at 9:44
Comment from gilgamash helped me. Open csv file in a text editor (like the windows editor or notepad++) so see which character is used for separation. If it's a semicolon e.g. try pd.read_csv("<path>", sep=";"). Do not use Excel for checking as it sometimes puts the data into columns by default and therefore removes the separator. — Julian, Commented Jun 19, 2019 at 13:05
If the separator does not work, I also recommend you trying the parameter engine='python', which worked for me. The C parser had some kind of trouble with the type of report I was analyzing. — jasper, Commented Jun 17, 2020 at 13:15

Anatoly Alekseev · Accepted Answer · 2023-11-21 01:11:53Z

1041

you could also try;

data = pd.read_csv('file1.csv', on_bad_lines='skip')

Do note that this will cause the offending lines to be skipped. If you don't expect many bad lines and want to (at least) know their amount and IDs, use on_bad_lines='warn'. For advanced handling of bads, you can pass a callable.

Edit

For Pandas < 1.3.0 try

data = pd.read_csv("file1.csv", error_bad_lines=False)

as per pandas API reference.

edited Nov 21, 2023 at 1:11

Anatoly Alekseev

2,31026 silver badges31 bronze badges

answered Aug 8, 2013 at 14:47

richie

18.4k19 gold badges52 silver badges70 bronze badges

19

Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?
– Petra Barus
Commented Sep 24, 2014 at 10:11
44

The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?
– abcd
Commented Oct 6, 2014 at 22:57
7

Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Commented Oct 7, 2014 at 2:17
19

Passing in names=["col1", "col2", ...] for the max number of expected columns also works, and this is how I solved this issue when I came across it. See: stackoverflow.com/questions/18039057/…
– Steven Rouk
Commented Jan 8, 2019 at 18:58
36

This should not be the accepted answer, lines will be skipped and you don't know why...
– PV8
Commented Dec 5, 2019 at 13:44

| Show 8 more comments

Dr Manhattan · Accepted Answer · 2021-10-05 20:05:19Z

It might be an issue with

the delimiters in your data
the first row, as @TomAugspurger noted

To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,

df = pandas.read_csv(filepath, sep='delimiter', header=None)

In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.

According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.

Another solution may be to try auto detect the delimiter

# use the first 2 lines of the file to detect separator
temp_lines = csv_file.readline() + '\n' + csv_file.readline()
dialect = csv.Sniffer().sniff(temp_lines, delimiters=';,')

# remember to go back to the start of the file for the next time it's read
csv_file.seek(0) 

df = pd.read_csv(csv_file, sep=dialect.delimiter)

I experienced the same issue as OP and solved it by specifying delimiter=";" — KasperGL, Commented Aug 25, 2022 at 14:40
@KasperGL yup, delimiter is an alias for sep (source: pandas.pydata.org/pandas-docs/version/1.4/reference/api/…) — william_grisaitis, Commented Aug 27, 2022 at 21:29

Lucas · Accepted Answer · 2017-06-01 13:31:55Z

74

This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (\t) using separator /t. so, try to open using following code line.

data=pd.read_csv("File_path", sep='\t')

edited Jun 1, 2017 at 13:31

Lucas

7,1516 gold badges29 silver badges44 bronze badges

answered Apr 1, 2015 at 5:42

Piyush S. Wanare

4,9036 gold badges40 silver badges55 bronze badges

6

@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('\t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Commented Apr 13, 2016 at 19:54
1

in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Commented Jul 17, 2018 at 16:41
If commas are used in the values but tab is the delimiter and sep is not used (or as suggested above the delimiters whatever it is assumed to be occurs in the values) then this error will arise. Make sure that the delimiter does not occur in any of the values else some rows will appear to have the incorrect number of columns
– demongolem
Commented Mar 11, 2020 at 11:10
1

I'm using excel 2016 while creating the CSV, and using sep=';' work for me
– greendino
Commented Mar 20, 2020 at 6:34

Add a comment |

TomAugspurger · Accepted Answer · 2013-08-04 02:24:35Z

71

The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.

Try it with data = pd.read_csv(path, skiprows=2)

answered Aug 4, 2013 at 2:24

TomAugspurger

28.7k8 gold badges88 silver badges70 bronze badges

Add a comment |

Steven Rouk · Accepted Answer · 2019-01-08 18:57:22Z

46

I had this problem, where I was trying to read in a CSV without passing in column names.

df = pd.read_csv(filename, header=None)

I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.

col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)

answered Jan 8, 2019 at 18:57

Steven Rouk

8937 silver badges9 bronze badges

2

This answer better because the row doesn't get deleted compared to if using the error_bad_line=False. Additionally, you can easily figure out which lines were the problem ones once making a dataframe from this solution.
– zipline86
Commented Mar 27, 2020 at 21:47
I agree with @zipline86. This answer is safe and intelligent.
– Monica Heddneck
Commented Apr 23, 2020 at 0:09
this solution its too hackish to me, but it works. I solved my issue passing engine='python' in read_csv to deal with variable columns size
– Savrige
Commented Sep 18, 2020 at 15:49
1

What if you had fewer column names than what the row has? (e.g. the row has 10 columns but you wrote three column names? How would you automatically add more columns in addition to the already specified column names?
– GrrHackPrecioussss
Commented Dec 27, 2022 at 21:31

Add a comment |

Ajean · Accepted Answer · 2017-09-20 00:53:42Z

43

Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:

1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])

2) Or use names = list(range(0,N)) where N is the max number of columns.

edited Sep 20, 2017 at 0:53

Ajean

5,59714 gold badges50 silver badges69 bronze badges

answered Mar 31, 2017 at 16:29

computerist

9228 silver badges9 bronze badges

names=range(N) should suffice (using pandas=1.1.2 here)
– rvf
Commented Dec 3, 2020 at 9:11

Add a comment |

Robert Geiger · Accepted Answer · 2016-02-04 22:16:44Z

I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:

data = pd.read_csv('file1.csv', error_bad_lines=False)

If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:

line     = []
expected = []
saw      = []     
cont     = True 

while cont == True:     
    try:
        data = pd.read_csv('file1.csv',skiprows=line)
        cont = False
    except Exception as e:    
        errortype = e.message.split('.')[0].strip()                                
        if errortype == 'Error tokenizing data':                        
           cerror      = e.message.split(':')[1].strip().replace(',','')
           nums        = [n for n in cerror.split(' ') if str.isdigit(n)]
           expected.append(int(nums[0]))
           saw.append(int(nums[2]))
           line.append(int(nums[1])-1)
         else:
           cerror      = 'Unknown'
           print 'Unknown Error - 222'

if line != []:
    # Handle the errors however you want

I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.

Thank you for this solution !! It's a very useful tip.
– WangSung
Commented Aug 9, 2021 at 10:12 — WangSung, Commented Aug 9, 2021 at 10:12

d_- · Accepted Answer · 2019-08-20 09:37:20Z

18

The following worked for me (I posted this answer, because I specifically had this problem in a Google Colaboratory Notebook):

df = pd.read_csv("/path/foo.csv", delimiter=';', skiprows=0, low_memory=False)

answered Aug 20, 2019 at 9:37

d_-

1,4612 gold badges23 silver badges38 bronze badges

1

I experimented problems when not setting | as the delimiter for my .csv. I rather to try this approach first, instead of skipping lines, or bad lines.
– ivanleoncz
Commented Sep 2, 2019 at 16:16
I also had the same problem, I assumed "\t" would be detected as a delimiter by default. It worked when I explicitly set the delimiter to "\t".
– Rahul Jha
Commented Sep 20, 2019 at 17:39
1

I had the same problem for a large .csv file (~250MB), with some corrupted lines spanning less columns than the data frame actually has. I was able to avoid the exception in two ways: 1) By modifying (for example deleting) a couple of unrelated rows far away from the line causing the exception. 2) By setting low_memory=False. In other .csv files with the same type of mal-formatted lines, I don't observe any problems. In summary, this indicates that the handling of large-file by pandas.read_csv() somehow is flawed.
– normanius
Commented Mar 23, 2021 at 13:11
1

I filed a bug report related to my previous comment.
– normanius
Commented Mar 23, 2021 at 15:28

Add a comment |

double-beep · Accepted Answer · 2020-09-08 18:03:57Z

18

You can try;

data = pd.read_csv('file1.csv', sep='\t')

edited Sep 8, 2020 at 18:03

double-beep

5,35819 gold badges37 silver badges45 bronze badges

answered Sep 8, 2020 at 15:58

Manodhya Opallage

3012 silver badges9 bronze badges

3

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. From Review
– double-beep
Commented Sep 8, 2020 at 18:04

Add a comment |

Legend_Ari · Accepted Answer · 2017-07-07 09:32:01Z

I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.

Edit: I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.

I usually get around this by reading the extra data into a file then use the read_csv() method.

The exact solution might differ depending on your actual file, but this approach has worked for me in several cases

elPastor · Accepted Answer · 2016-07-07 17:22:00Z

I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.

Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.

Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.

Hope that helps.

amran hossen · Accepted Answer · 2020-02-16 09:58:45Z

Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

The error gives a clue to solve the problem " Expected 2 fields in line 3, saw 12", saw 12 means length of the second row is 12 and first row is 2.

When you have data like the one shown below, if you skip rows then most of the data will be skipped

data = """1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4"""

If you dont want to skip any rows do the following

#First lets find the maximum column for all the rows
with open("file_name.csv", 'r') as temp_f:
    # get No of columns in each line
    col_count = [ len(l.split(",")) for l in temp_f.readlines() ]

### Generate column names  (names will be 0, 1, 2, ..., maximum columns - 1)
column_names = [i for i in range(max(col_count))] 

import pandas as pd
# inside range set the maximum value you can see in "Expected 4 fields in line 2, saw 8"
# here will be 8 
data = pd.read_csv("file_name.csv",header = None,names=column_names )

Use range instead of manually setting names as it will be cumbersome when you have many columns.

Additionally you can fill up the NaN values with 0, if you need to use even data length. Eg. for clustering (k-means)

new_data = data.fillna(0)

user3426943user3426943 · Accepted Answer · 2019-06-26 19:09:22Z

9

The dataset that I used had a lot of quote marks (") used extraneous of the formatting. I was able to fix the error by including this parameter for read_csv():

quoting=3 # 3 correlates to csv.QUOTE_NONE for pandas

answered Jun 26, 2019 at 19:09

user3426943

3

stumbled across the exact same thing. As far as I'm concerned, this is the correct answer. The accepted one just hides the error.
– lhk
Commented Aug 18, 2019 at 13:40

Add a comment |

lotrus28 · Accepted Answer · 2017-04-25 15:00:01Z

I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:

1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444  2328    "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='\t', index_col=2, header=None, engine = 'c')

pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything

counts = pd.read_table(path_counts, sep='\t', index_col=2, header=None, engine='python')

Segmentation fault (core dumped)

Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:

1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444  2328    "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


_csv.Error: '   ' expected after '"'

And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.

To avoid creating a new file with replacements I did this, as my tables are small:

from io import StringIO
with open(path_counts) as f:
    input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('\0',''))
    counts = pd.read_table(input, sep='\t', index_col=2, header=None, engine='python')

tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.

Bhavesh Kumar · Accepted Answer · 2018-11-21 13:03:24Z

8

Use delimiter in parameter

pd.read_csv(filename, delimiter=",", encoding='utf-8')

It will read.

answered Nov 21, 2018 at 13:03

Bhavesh Kumar

1161 silver badge9 bronze badges

Add a comment |

KareemJ · Accepted Answer · 2019-11-03 10:10:42Z

As far as I can tell, and after taking a look at your file, the problem is that the csv file you're trying to load has multiple tables. There are empty lines, or lines that contain table titles. Try to have a look at this Stackoverflow answer. It shows how to achieve that programmatically.

Another dynamic approach to do that would be to use the csv module, read every single row at a time and make sanity checks/regular expressions, to infer if the row is (title/header/values/blank). You have one more advantage with this approach, that you can split/append/collect your data in python objects as desired.

The easiest of all would be to use pandas function pd.read_clipboard() after manually selecting and copying the table to the clipboard, in case you can open the csv in excel or something.

Irrelevant:

Additionally, irrelevant to your problem, but because no one made mention of this: I had this same issue when loading some datasets such as seeds_dataset.txt from UCI. In my case, the error was occurring because some separators had more whitespaces than a true tab \t. See line 3 in the following for instance

14.38   14.21   0.8951  5.386   3.312   2.462   4.956   1
14.69   14.49   0.8799  5.563   3.259   3.586   5.219   1
14.11   14.1    0.8911  5.42    3.302   2.7     5       1

Therefore, use \t+ in the separator pattern instead of \t.

data = pd.read_csv(path, sep='\t+`, header=None)

double-beep · Accepted Answer · 2019-10-14 15:11:30Z

6

For those who are having similar issue with Python 3 on linux OS.

pandas.errors.ParserError: Error tokenizing data. C error: Calling
read(nbytes) on source failed. Try engine='python'.

Try:

df.read_csv('file.csv', encoding='utf8', engine='python')

edited Oct 14, 2019 at 15:11

double-beep

5,35819 gold badges37 silver badges45 bronze badges

answered Oct 14, 2019 at 14:54

Zstack

4,6031 gold badge23 silver badges25 bronze badges

1

I had a file where there were commas in some certain fields/columns and while trying to read through pandas read_csv() it was failing, but after specifying engine="python" within read_csv() as a parameter it worked - Thanks for this!
– Mohamed Niyaz
Commented Jan 9, 2022 at 19:30
This results in more rows than intended for me..
– Matt Yoon
Commented Sep 5, 2022 at 19:08

Add a comment |

Laurent T · Accepted Answer · 2020-05-13 06:41:07Z

I believe the solutions,

,engine='python'
, error_bad_lines = False

will be good if it is dummy columns and you want to delete it. In my case, the second row really had more columns and I wanted those columns to be integrated and to have the number of columns = MAX(columns).

Please refer to the solution below that I could not read anywhere:

try:
    df_data = pd.read_csv(PATH, header = bl_header, sep = str_sep)
except pd.errors.ParserError as err:
    str_find = 'saw '
    int_position = int(str(err).find(str_find)) + len(str_find)
    str_nbCol = str(err)[int_position:]
    l_col = range(int(str_nbCol))
    df_data = pd.read_csv(PATH, header = bl_header, sep = str_sep, names = l_col)

I will take any better way to find the number of columns in the error message than what i just did — Laurent T, Commented May 13, 2020 at 6:45

RegularlyScheduledProgramming · Accepted Answer · 2016-10-03 15:45:19Z

5

Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.

result = pandas.read_csv(data_source, compression='gzip')

answered Oct 3, 2016 at 15:45

RegularlyScheduledProgramming

1,3871 gold badge13 silver badges28 bronze badges

Add a comment |

Mihai.Mehe · Accepted Answer · 2020-05-04 18:37:17Z

5

In my case the separator was not the default "," but Tab.

pd.read_csv(file_name.csv, sep='\\t',lineterminator='\\r', engine='python', header='infer')

Note: "\t" did not work as suggested by some sources. "\\t" was required.

edited May 4, 2020 at 18:37

answered May 4, 2020 at 18:27

Mihai.Mehe

4849 silver badges14 bronze badges

Add a comment |

Naseer · Accepted Answer · 2019-09-07 01:53:53Z

4

Simple resolution: Open the csv file in excel & save it with different name file of csv format. Again try importing it spyder, Your problem will be resolved!

answered Sep 7, 2019 at 1:53

Naseer

12710 bronze badges

1

Dude! Thank you. Your solution worked like a light switch.
– Jon Fillip
Commented Sep 30, 2021 at 19:23

Add a comment |

Abu Bakar Siddik · Accepted Answer · 2021-11-10 07:02:09Z

4

The issue is with the delimiter. Find what kind of delimiter is used in your data and specify it like below:

data = pd.read_csv('some_data.csv', sep='\t')

answered Nov 10, 2021 at 7:02

Abu Bakar Siddik

4394 silver badges8 bronze badges

Add a comment |

Sachin · Accepted Answer · 2021-12-14 06:25:23Z

4

I came across multiple solutions for this issue. Lot's of folks have given the best explanation for the answers also. But for the beginners I think below two methods will be enough :

import pandas as pd

#Method 1

data = pd.read_csv('file1.csv', error_bad_lines=False)
#Note that this will cause the offending lines to be skipped.

#Method 2 using sep

data = pd.read_csv('file1.csv', sep='\t')

answered Dec 14, 2021 at 6:25

Sachin

1,61418 silver badges26 bronze badges

Add a comment |

Aks4125 · Accepted Answer · 2017-11-15 12:13:37Z

3

Sometimes the problem is not how to use python, but with the raw data.
I got this error message

Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.

It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.

edited Nov 15, 2017 at 12:13

Aks4125

5,6874 gold badges35 silver badges54 bronze badges

answered Nov 15, 2017 at 10:59

Kims Sifers

311 bronze badge

Add a comment |

bcoz · Accepted Answer · 2018-01-26 20:54:38Z

An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:

import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)

#once contents are available, I then put them in a list
csv_list = []
for l in reader:
    csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)

I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.

kepy97 · Accepted Answer · 2018-05-23 11:45:25Z

following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):

df = pd.read_csv(filename, usecols=range(0, 42)) df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']

Following does NOT work:

df = pd.read_csv(filename, names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'], usecols=range(0, 42))

CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54 Following does NOT work:

df = pd.read_csv(filename, header=None)

CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54

Hence, in your problem you have to pass usecols=range(0, 2)

piseynir · Accepted Answer · 2021-06-01 14:51:53Z

3

Sometimes in a cell there is a comma ",". Due to that pandas can' t read it. Try delimiter with ";"

df = pd.read_csv(r'yourpath', delimiter=";")

answered Jun 1, 2021 at 14:51

piseynir

2271 gold badge5 silver badges15 bronze badges

Add a comment |

Anass Lahrech · Accepted Answer · 2022-09-23 10:29:20Z

3

You can use :

pd.read_csv("mycsv.csv", delimiter=";")

Pandas 1.4.4

It can be the delimiter of your file, open it as a text file, lookup for the delimiter. Then you will have columns that can be empty and unamed because of the rows that contains way too many delimiters.

Therefore, you can handle them with pandas and checking for values. For me, it's better than skipping lines in my case.

answered Sep 23, 2022 at 10:29

Anass Lahrech

1211 silver badge5 bronze badges

Add a comment |

reggie · Accepted Answer · 2022-11-08 12:57:41Z

3

Check if you are loading the csv with the correct separator.

df = pd.read_csv(csvname, header=0, sep=",")

answered Nov 8, 2022 at 12:57

reggie

3,61414 gold badges63 silver badges99 bronze badges

Add a comment |

Abhishek Tripathi · Accepted Answer · 2018-01-02 09:56:48Z

2

use pandas.read_csv('CSVFILENAME',header=None,sep=', ')

when trying to read csv data from the link

http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)

answered Jan 2, 2018 at 9:56

Abhishek Tripathi

1651 silver badge3 bronze badges

Add a comment |

Collectives™ on Stack Overflow

pandas.parser.CParserError: Error tokenizing data

51 Answers 51

Not the answer you're looking for? Browse other questions tagged
python
csv
pandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

51 Answers 51

Not the answer you're looking for? Browse other questions tagged pythoncsvpandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
csv
pandas
or ask your own question.