Author:  David Mertz
Publisher:  Packt
Publication Date:  February 5, 2021
Publication Link
Prerequisites: Some Python or R

Disclaimer: The publisher sent me a copy of this book for review. I promise that everything said here is my own opinion regardless. All reviews at the Cross Trained Mind are open and honest.

About This Book

This is a crucial book thanks to the deluge of data we currently have and use in our software applications. It looks at both structure and content issues with data of various types and the pros and cons of methods to clean it enough to be useful.

Who Is This For?

This is a useful book for anyone who imports data into their application, which is a good number of us. Given the Python and R code in the book, it’s good to have some knowledge and experience of one of these languages, but that’s about all you need to know. I personally recommend anything earning a computer science degree to work through this book around the same time they learn about data structures and algorithms.

Organization

The overall organization of this book follows a standard data pipeline that you might set up for your application and what cleansing issues you might need to resolve along the way. This is, in my opinion, a great way to set up the book. Within each chapter, you have your topics, the exercises, and then a summary for the chapter. All in all, this is a well-organized book.

Did This Book Succeed?

I believe that the author did a tremendous job on a difficult and large topic. This is one of the most time-consuming and least talked about portions of any data pipeline for data science and machine learning tasks. It is also one of the most important. Anyone working in data or AI needs to read through this book and learn how to implement its processes in order to have cleaner and therefore more useful data.

Rating and Final Thoughts

I give this book a 5 out of 5.

It is useful, timely, and well organized. While it may not be set up like a cookbook or reference, it can use used as such. It is well suited as a textbook either for a course or self-study. It should be in anyone’s personal library, ready to be pulled when there is data to be cleansed.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *