sql and pandas: part 5
selecting rows using conditionals
Previously we learned how to get rows from the top or bottom but sorting columns. Now we are going to learn how filter rows using logic.
We start by looking at finding all the rows where length is 5.
select *
from employees
where length = 5In pandas we first set the conditional to values equal to 5 and then wrap that statement with df[].
df.loc[df['length'] == 5]If we run just the inner statement we can see that the output is a boolean value based on whether it meets the criteria we set.
df['length'] == 5
0 True
1 False
Name: length, dtype: boolWe can also select string values the same way we select integers.
select *
from employees
where location = 'Moon'And in pandas:
df.loc[df['location'] == 'Moon']Now let’s select rows in our table that contain ‘g’ in it.
select *
from employees
where name like "%g%"Do do this in pandas we need to introduce the str method.
df[df['name'].str.contains('G', regex=False)]This will return the Tom Grady row, but if we pass ‘g’ instead we return nothing. But if we ignore case by setting case=False then we will return the row.
We can do the same thing in sql by turning the name column into lowercase and then searching for the lowercase g.
where lower(name) like "%g%"If we want to select by multiple conditions we can use a AND, OR, NOT operators
where lower(name) like "%g%" or location = "Moon" In this case we will return both rows as they fit the OR condition.
df[(df['name'].str.contains('G', regex=False)) |
(df['location'].str.contains('Moon', regex=False))]In pandas we use “|” to represent the or condition.
Further reading:
