Hi,

Title may sound extremely Hitech for someone who never heard about pandas ;), but what I have written is a simple hello world equivalentΒ  program, which I guess should start to help my day to day analysis, as always the aim is to let anyone know the advantage of something than hammering with some theory !

I was going through various python packages available to analyze data and came across pandas package along with numpy package. These are not there by default in Python installation and if you like them to be on your system, you should install them via PIP, I have them installed already hence you can see that it complains in the below image.

 

Note :

Understand why you need to have something like Pandas / Numpy even if you have never heard of them, that’s the point of this tiny program

Imagine, how you would solve this if you never knew Pandas/Numpy and you will see the power of these

packages, again you don’t have to know these to realize their full power.

 

Now coming to the requirement, here is a sample spreadsheet that I have below, its a CSV Sheet which contains certain values as RMA_Status and device names etc., a cooked-up sheet as you can clearly see

You can find it here as well

https://github.com/yukthr/auts/blob/master/random_programs/rma_status.csv

Requirement : Pretty simple, have the list of all Devices which are marked for RMA_Status Yes, well most of the times we can do via a GREP/Egrep, but it gets touch when you have lot of fields and when most of the tools already gives us a csv, this should be handy way to analyze or make a Cron-job to do it on a daily basis

Its a very simple program, nothing complicated (not even remotely capable of πŸ˜‰ )

Below we are importing Pandas and Numpy, If you are not aware about these packages I would suggest to know their basic Intro, youtube is full of it, their use cases can save you a lots of time.

Have one Boolean Numpy Array created which has True and False Values out of your own, Data

Conditions. Here, we are seeing for the word ‘yes’, basically doing the below code is the crucial part and once we have the below, we are as good as printing the values which have ‘True’ vs leaving the values with ‘False’

Finally, we will take the Boolean Array and supply it back to our DataFrame, and it would return all the values which has appropriate ‘True’

 

Code for this – https://github.com/yukthr/auts/blob/master/random_programs/pandas_rma_analysis.py

This can be extended to whatever use case we can think off, people good in excel will do this in jiffy, but am not an expert in Excel.

 

-Rakesh