본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 필터링] 비상관 서브쿼리를 만족하는 데이터 추출 - 53 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 16.

포스팅 목차

    53. Display the names of the salesman who earns a salary more than the highest salary of any clerk.

     

    * 사무직 직원의 최대 연봉보다 더 많은 연봉을 받는 판매직 직원의 정보를 출력하시오.


    • Oracle : 서브쿼리(Subquery), max() 함수
    • 파이썬 Pandas : max() 함수
    • R 프로그래밍 : max() 함수
    • R Dplyr Package : filter(), max() 함수, pull() : 벡터 추출, unique()
    • R sqldf Package : 서브쿼리(Subquery), max() 함수
    • Python pandasql Package : 서브쿼리(Subquery), max() 함수
    • R data.table Package : max() 함수
    • SAS Proc SQL : 서브쿼리(Subquery), max() 함수
    • SAS Data Step : CROSS JOIN (카테시안 곱) 구현, Macro 변수
    • Python Dfply Package : filter_by(), max() 함수, @dfpipe (사용자 정의 함수)
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • 비상관 서브쿼리
    Oracle Programming
    select ename 
    from   emp 
    where  job='SALESMAN' 
      and  sal > (select max(sal) from emp where job='CLERK');

     


    2. Python Pandas(파이썬)

    • Max 함수
    Python Programming
    emp[(emp['job']=='SALESMAN') & (emp['sal'] > max(emp[emp['job']=='CLERK'].sal))]

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    9 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30

     


    3. R Programming (R Package)

    • Max 함수
    R Programming
    %%R
    
    emp[ emp$job == 'SALESMAN' & emp$sal > max(emp[emp$job == 'CLERK',]$sal),]

     

    Results
    # A tibble: 2 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30
    2  7844 TURNER SALESMAN  7698 1981-09-08  1500     0     30

     


    4. R Dplyr Package

    • Filter 함수와 Max 함수
    R Programming
    %%R
    
    emp %>%
      filter(job == "SALESMAN" & sal > ( emp %>% filter(job == 'CLERK') 
                                             %>% summarise(sal_max = max(sal)) 
                                             %>% pull(sal_max) 
                                             %>% unique) )

     

    Results
    # A tibble: 2 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30
    2  7844 TURNER SALESMAN  7698 1981-09-08  1500     0     30

     


    5. R sqldf Package

    • 비상관 서브쿼리
    R Programming
    %%R
    sqldf(" select * 
            from   emp 
            where  job='SALESMAN' 
              and  sal > (select max(sal) from emp where job='CLERK');")

     

    Results
      empno  ename      job  mgr   hiredate  sal comm deptno
    1  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30
    2  7844 TURNER SALESMAN 7698 1981-09-08 1500    0     30

     


    6. Python pandasql Package

    • 비상관 서브쿼리
    Python Programming
    ps.sqldf("select * from emp where job='SALESMAN' and sal > (select max(sal) from emp where job='CLERK');")

     

    Results

     

      empno ename job mgr hiredate sal comm deptno
    0 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    1 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30

     


    7. R data.table Package

    • Max 함수
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[job == "SALESMAN" & sal > (DT[job == "CLERK", max(sal)]) ,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno
    1:  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30
    2:  7844 TURNER SALESMAN 7698 1981-09-08 1500    0     30

     


    8. SAS Proc SQL

    • 비상관 서브쿼리
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select ename 
        from   emp 
        where  job='SALESMAN' 
          and  sal > (select max(sal) from emp where job='CLERK');
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS ename
    1 ALLEN
    2 TURNER

     


    9. SAS Data Step

    • DATA STEP(CROSS JOIN-카테시안 곱);
    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=EMP_CLERK;
         BY DESCENDING SAL;
         WHERE job='CLERK';
    RUN;
    
    DATA EMP_SALESMAN;
     SET EMP;
         WHERE job='SALESMAN';
    RUN;
    
    DATA STATSAS_3; 
     SET EMP_CLERK(obs=1 RENAME=(SAL=SAL_CLERK)); 
         DO I=1 TO KOBS; 
            SET EMP_SALESMAN( RENAME=(SAL=SAL_SALESMAN)) NOBS=KOBS POINT=I;
            IF SAL_CLERK < SAL_SALESMAN THEN OUTPUT;
         END;    
    RUN;
    
    PROC PRINT;RUN;
    

     

    Results
    OBS empno ename job mgr hiredate SAL_CLERK comm deptno SAL_SALESMAN
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1300 300 30 1600
    2 7844 TURNER SALESMAN 7698 1981-09-08 1300 0 30 1500

     


    • Macro 변수
    SAS Programming
    %%SAS sas
    
    proc sql noprint;
      select max(sal) into :SAL_CLERK_MAX 
      from   EMP
      WHERE  job='CLERK';
    quit;
    
    DATA STATSAS_3;
     SET EMP;
         SAL_CLERK_MAX = &SAL_CLERK_MAX;
         IF SAL > &SAL_CLERK_MAX THEN OUTPUT;
         WHERE job='SALESMAN';
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno SAL_CLERK_MAX
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1300
    2 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1300

     


    10. Python Dfply Package

    • 52번 스칼라 추출 참고
    Python Programming
    ## Dfply
    
    @dfpipe
    def pull_fun(df, column=-1):
        return df.loc[:, column].item()
    
    emp >> filter_by(X.job == "SALESMAN" , X.sal > ( emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max()) >> pull_fun('sal_max') ) )
    

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    9 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글