BLOG

 

Turning Data Into Awesome With sqldf and pandasql

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts.

head(df1)  head(df2)

R and sqldf

sqldf("SELECT COUNT(*) FROM df2 WHERE state = 'CA'")
 
  COUNT(*)
1        4
 
sqldf("SELECT
            df2.firstname,
            df2.lastname,
            df1.var1,
            df2.state
      FROM df1
      INNER JOIN df2 ON df1.personid = df2.id
      WHERE df2.state = 'TX'")
 
  firstname lastname  var1 state
1     David    Spade -2.09    TX
2       Joe  Montana  1.16    TX
 
sqldf("SELECT
            df2.state,
            COUNT(df1.var1)
      FROM df1
      INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
      GROUP BY df2.state")
 
   state COUNT(df1.var1)
1     AZ               1
2     CA               1
3     GA               1
4     IL               1
5     NC               1
6     NY               1
7     OK               1
8     SC               1
9     TX               1
10    VT               1

Python and pandasql

import pandasql as ps
 
 
q1 = """SELECT COUNT(*) FROM df2 WHERE state = 'CA'"""
 
print ps.sqldf(q1, locals())
 
   COUNT(*)
0         4
 
q2 = """
    SELECT 
        df2.firstname, 
        df2.lastname, 
        df1.var1, 
        df2.state 
    FROM df1 INNER JOIN df2 ON df1.personid = df2.id 
    WHERE df2.state = "TX";
    """
 
print ps.sqldf(q2, locals())
 
  firstname lastname  var1 state
0     David    Spade -2.09    TX
1       Joe  Montana  1.16    TX
 
q3 = """SELECT 
            df2.state, 
            COUNT(df1.var1) 
      FROM df1
      INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
      GROUP BY df2.state"""
 
print ps.sqldf(q3, locals())
 
  state  COUNT(df1.var1)
0    AZ                1
1    CA                1
2    GA                1
3    IL                1
4    NC                1
5    NY                1
6    OK                1
7    SC                1
8    TX                1
9    VT                1

Post a comment