본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 필터링] 비상관 서브쿼리 결과를 만족하는 데이터 추출 - 55 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 19.

포스팅 목차

    55. Display the names of employees who earn a Sal more than that of James or that of salary lesser than that of Scott.

     

    * 직원 'JAMES' 의 급여보다 크고, 직원 'SCOTT'의 급여보다 적은 직원의 정보를 출력하시오.


    • Oracle : 서브쿼리(Subquery), where 구문
    • 파이썬 Pandas : iloc[], 비교구문
    • R 프로그래밍 : 비교구문
    • R Dplyr Package : filter(), pull() : 벡터 추출, unique()
    • R sqldf Package : 서브쿼리(Subquery), where 구문
    • Python pandasql Package : 서브쿼리(Subquery), where 구문
    • R data.table Package : 비교구분
    • SAS Proc SQL : 서브쿼리(Subquery), where 구문
    • SAS Data Step : CROSS JOIN (카테시안 곱) 구현, Macro 변수 할당
    • Python Dfply Package : filter_by(), @dfpipe (사용자 정의 함수), 비교구분
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • 비상관 서브쿼리
    Oracle Programming
    select * 
    from   emp 
    where sal < (select sal from emp where ename='SCOTT') 
      and sal > (select sal from emp where ename='JAMES');

     


    2. Python Pandas(파이썬)

    • iloc[0] : pandas.core.series.Series 를 numpy.int64로 변경
    • max(emp[emp['ename']=='SCOTT'].sal) 를 지정하면 바로 해결
    Python Programming
    emp[(emp['sal'] < (emp[emp['ename']=='SCOTT'].sal).iloc[0])
      & (emp['sal'] > (emp[emp['ename']=='JAMES'].sal).iloc[0])]

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    9 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30
    10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10

     


    3. R Programming (R Package)

    • 데이터 필터링
    R Programming
    %%R
    
    emp[   emp$sal < (emp[emp$ename == 'SCOTT',]$sal)
         & emp$sal > (emp[emp$ename == 'JAMES',]$sal),]

     

    Results
    # A tibble: 9 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30
    2  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30
    3  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20
    4  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    5  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30
    6  7782 CLARK  MANAGER   7839 1981-01-09  2450    NA     10
    7  7844 TURNER SALESMAN  7698 1981-09-08  1500     0     30
    8  7876 ADAMS  CLERK     7788 1983-01-12  1100    NA     20
    9  7934 MILLER CLERK     7782 1982-01-23  1300    NA     10

     


    4. R Dplyr Package

    • 데이터 필터링
    R Programming
    %%R
    
    emp %>%
      filter(sal < ( emp %>% filter(ename == 'SCOTT') 
                           %>% pull(sal) 
                           %>% unique ) 
             & sal > ( emp %>% filter(ename == 'JAMES') 
                           %>% pull(sal) 
                           %>% unique ) 
            )

     

    Results
    # A tibble: 9 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30
    2  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30
    3  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20
    4  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    5  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30
    6  7782 CLARK  MANAGER   7839 1981-01-09  2450    NA     10
    7  7844 TURNER SALESMAN  7698 1981-09-08  1500     0     30
    8  7876 ADAMS  CLERK     7788 1983-01-12  1100    NA     20
    9  7934 MILLER CLERK     7782 1982-01-23  1300    NA     10

     


    5. R sqldf Package

    • 비상관 서브쿼리
    R Programming
    %%R
    sqldf("select * 
           from   emp 
           where  sal < (select sal from emp where ename='SCOTT') 
             and  sal > (select sal from emp where ename='JAMES')")

     

    Results
      empno  ename      job  mgr   hiredate  sal comm deptno
    1  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30
    2  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30
    3  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20
    4  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30
    5  7698  BLAKE  MANAGER 7839 1981-03-01 2850   NA     30
    6  7782  CLARK  MANAGER 7839 1981-01-09 2450   NA     10
    7  7844 TURNER SALESMAN 7698 1981-09-08 1500    0     30
    8  7876  ADAMS    CLERK 7788 1983-01-12 1100   NA     20
    9  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10

     


    6. Python pandasql Package

    • 비상관 서브쿼리
    Python Programming
    ps.sqldf("select *   \
              from   emp \
              where sal < (select sal from emp where ename='SCOTT') \
                and sal > (select sal from emp where ename='JAMES')")

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    1 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30
    2 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    3 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30
    4 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    5 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    6 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30
    7 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    8 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10

     


    7. R data.table Package

    • 데이터 필터링
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[sal< (DT[ename == "SCOTT", (sal)]) & sal > (DT[ename == "JAMES", (sal)]) ,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno
    1:  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30
    2:  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30
    3:  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20
    4:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30
    5:  7698  BLAKE  MANAGER 7839 1981-03-01 2850   NA     30
    6:  7782  CLARK  MANAGER 7839 1981-01-09 2450   NA     10
    7:  7844 TURNER SALESMAN 7698 1981-09-08 1500    0     30
    8:  7876  ADAMS    CLERK 7788 1983-01-12 1100   NA     20
    9:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10

     


    8. SAS Proc SQL

    • 비상관 서브쿼리를 이용한 데이터 필터링
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select ename 
        from   emp 
        where  sal < (select sal from emp where ename='SCOTT') 
          and  sal > (select sal from emp where ename='JAMES');
    QUIT;
    PROC PRINT data=STATSAS_1(obs=5);RUN;

     

    Results
    OBS ename
    1 ALLEN
    2 WARD
    3 JONES
    4 MARTIN
    5 BLAKE

     


    9. SAS Data Step

    • DATA STEP(CROSS JOIN-카테시안 곱) 방식;
    SAS Programming
    %%SAS sas
    
    DATA STATSAS_3; 
     SET EMP(WHERE=(ename='SCOTT') RENAME=(SAL=SCOTT_SAL)); 
     SET EMP(WHERE=(ename='JAMES') RENAME=(SAL=JAMES_SAL));
         DO I=1 TO KOBS; 
            SET EMP NOBS=KOBS POINT=I;
            IF SAL < SCOTT_SAL AND SAL > JAMES_SAL THEN OUTPUT;
         END;    
    RUN;
    
    PROC PRINT data=STATSAS_3(OBS=3);RUN;

     

    Results
    OBS empno ename job mgr hiredate SCOTT_SAL comm deptno JAMES_SAL sal
    1 7499 ALLEN SALESMAN 7698 1981-02-20 3000 300 30 950 1600
    2 7521 WARD SALESMAN 7698 1981-02-22 3000 500 30 950 1250
    3 7566 JONES MANAGER 7839 1981-04-02 3000 . 20 950 2975

     


    • Macro 변수 할당 방식
    %%SAS sas
    
    proc sql noprint;
      select SAL into :SCOTT_SAL
      from   EMP
      WHERE  ename='SCOTT';
    quit;
    
    proc sql noprint;
      select SAL into :JAMES_SAL
      from   EMP
      WHERE  ename='JAMES';
    quit;
    
    DATA STATSAS_3;
     SET EMP;
         IF SAL < &SCOTT_SAL AND SAL > &JAMES_SAL THEN OUTPUT;
    RUN;
    PROC PRINT data=STATSAS_3(obs=3);RUN;
    

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    2 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20

     


    10. Python Dfply Package

    • 데이터 필터링
    Python Programming
    # [참고] 함수 사용. (52번 예제 참고)
    
    @dfpipe
    def pull_fun(df, column=-1):
        return df.loc[:, column].item()
    
    emp >> filter_by( X.sal < ( emp >> filter_by(X.ename == 'SCOTT') >> pull_fun('sal')) , \
                      X.sal > ( emp >> filter_by(X.ename == 'JAMES') >> pull_fun('sal')) )

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    9 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30
    10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글