Category ETL

Remove CSV text qualifiers within a field using Python

Have you ever tried to import a CSV into a database and found that it won’t load because one of your CSV fields has a text qualifier inside the field itself? In this post I look at how to resolve this issue using a few lines of Python code.

My Top 10 ETL Best Practices

It’s been over ten years since I coded my first ever ETL routine. Since then I’ve collected together a number of important lessons and best practices, presented here as my very own ‘Ten Commandments’ of ETL. 1. Know your data Whether you’re defining an ETL strategy, designing a set of data flows or writing the code, the single […]

The Cloud

Amazon Redshift

At its recent re:Invent partner conference, Amazon unveiled a cloud-based data warehousing solution, provisionally named Redshift. The Register has more details here. This is not the first time that the concepts of “cloud” and “BI” have come together. What is interesting, though, is how Amazon is seeking to position this new technology (or, at least, […]