The Photograph That Helped Me Heal

Many years ago my friend Kathy approached me and asked if I would be interested in helping her supply a new orphanage in Cambodia with furniture, clothing and essentials. She explained that gathering…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Pandas Essentials

Pandas is a python library used for data analysis. It provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. And it also has a user-friendly API.

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional) handle the vast majority of typical use cases in biological sciences, finance, statistics, social science, and many areas of engineering. In a future article, I’ll explain why more than one data structure is needed in pandas but for now, let’s get on with the flow.

This article focuses on introducing you to the essential attributes and methods Pandas possesses and how to use them.

List of topics:

pandas has two data structures namely:

Essentially, we can think of a Series to be a 1-dimensional container for scalers. And a DataFrame as a multi-dimensional container that holds a Series.

Let us create a Series and a DataFrame in the following cells.

As shown above, a Pandas Series is created by passing a list of values to the Series() method and allowing Pandas to automatically create a default index integer.

Now let’s create a DataFrame by passing a NumPy array, with a DateTime index and labeled columns to the DataFrame() method:

The above cell creates a DataFrame having 10 rows and 10 columns with an index having the dates object and column names “A, B, C, D, E, F, G, H, I, J”.

We can also create a DataFrame using dictionaries in python. This is done by passing a dictionary of objects that can be converted to series-like.

Pandas has two methods used to view the rows of a dataset:

And two attributes used to view the index and columns of a dataset:

These are helpful when you want to have a quick understanding of your dataset.

Let us see how they work in the following cells.

To view the index and columns:

The describe() method provides a quick summary of the numerical data contained in your dataset.

You can transpose your DataFrame with:

And sort by axes and values with:

View the transposed DataFrame

In pandas, rows and columns are known as axis 0 and axis 1 respectively.

The above sorts the DataFrame based on the values of column B. By default, it is set to sort it in ascending order but that can be changed by setting the argument ascending = False.

These methods include:

Although, standard python/NumPy expressions for selecting and setting are intuitive and good for interactive work, working with Pandas data access methods is highly recommended because they are optimized for writing production code.

Selecting with index and column names can be achieved like so:

Reduction in the dimension of the returned object:

A scaler value can be obtained by:

For getting fast access to a scaler (same as the method used above):

Selection can be done via the position of the passed integers:

By integer slices, acting similar to NumPy/python:

By list of integer position locations, similar to NumPy/python:

For slicing rows explicitly

For slicing columns explicitly

For getting a value explicitly:

For fast access to a scaler

Setting a new column automatically aligns the data by the indexes.

Pass the values of the above series to the ‘F’ column of our DataFrame object.

setting values by label

This sets the value of the first row in column ‘A’ to zero.

Setting values by position

This sets the value of the first row in column ‘B’ to zero.

Setting by assigning a NumPy array

View the changes made to our DataFrame object by the above setting operations.

To obtain a particular column in a DataFrame

Using a single column’s values to select data:

Selecting values from a DataFrame where a boolean condition is met.

Using the isin() method for filtering:

Pandas primarily uses the value np.nan to represent missing data. It is by default not included in computations.

Reindexing allows you to modify the index on a specified axis. This returns a copy of the data.

To get the boolean mask where values are nan

Doesn’t look like we have any missing data here 😋 But let’s continue with the tutorial anyway 🤓

To drop any rows that have missing data.

Filling missing data

Summary

This is just a quick introduction to the Pandas workflow as there a lot of topics and concepts not described here. Full understanding and mastery only come with regular practice and consistency. And these will help a great deal on your journey to becoming a Data Scientist.

References

The Photograph That Helped Me Heal

Pandas Essentials

Add a comment

Related posts:

I Bribed My Child to go to School

The Devastating Earthquake And Tsunami In Japan

What is the role of testosterone?