I’ve recently jumped head first into the world of Ruby. Overall, I really like the language; some concepts are a bit weird at first, but they make sense and are quite useful. When it comes to CSV processing, however, Ruby completely falls apart, and PHP shines through. Let’s explore!
Consider the following CSV sample:
"Company\nName",category,url ABC News,Network TV,http://www.abc.com E!,"Lifestyle, Fashion",http://www.eonline.com
I’ll point out the \n in the first line. Technically the \n is valid there because is quoting around it. This is tricky, but should not be impossible to overcome.
Let’s start with the bad news: Ruby chokes on this. There are basically two main libraries for CSVs: the built-in library that’s based on an earlier fork of faster_csv, and the smarter_csv gem. Neither of them could parse this at all; both threw an error: “CSV::MalformedCSVError: Illegal quoting in line 1.” PHP, however, has no problem with this. fgetcsv gets through it just fine, and is pretty fast. I also tried Python out to see how that goes, and the built-in csv library handled it without a problem as well.
So, my totally scientific and empirical evidence proves that you should not use Ruby for CSV parsing unless you can guarantee that your document is formatted exactly how Ruby wants it. Other languages are much more suitable.
Leave a Reply