본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[Having 절] 그룹별 합계 계산 후 결과 값이 특정 값 이상인 경우 - 49 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 16.

포스팅 목차

    49. Display the various jobs along with total sal for each of the jobs where total sal is greater than 4000.

     

    * 개별 직무별 급여 합계를 구한 후 그 합계값이 5000 초과인 직무와 급여 합계를 출력하시오.


    • Oracle : having 절
    • 파이썬 Pandas : groupby(), .query(), agg(), filter()
    • R 프로그래밍 : aggregate(), subset()
    • R Dplyr Package : group_by(), summarise(), filter()
    • R sqldf Package : having 절
    • Python pandasql Package : having 절
    • R data.table Package : .SD (Subset of Dataframe) , keyby
    • SAS Proc SQL : having 절
    • SAS Data Step : PROC SUMMARY, where 구문, FIRST.
    • Python Dfply Package : group_by(), summarize(), filter_by()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • Having 구문
    Oracle Programming
    select job, 
           sum(sal) sal_sum
    from   emp 
    group 
       by  job 
    having sum(sal)>5000;

     


    2. Python Pandas(파이썬)

    • Query() 함수
    Python Programming
    emp.groupby('job')['sal'].sum().reset_index().query('sal > 5000')

     

    Results
      job sal
    0 ANALYST 6000
    2 MANAGER 8275
    4 SALESMAN 5600

     


    • Query() 함수
    Python Programming
    emp.groupby('job').agg({'sal':'sum'}).reset_index().query('sal > 5000')

     

    Results
      job sal
    0 ANALYST 6000
    2 MANAGER 8275
    4 SALESMAN 5600

     


    • Filter() 함수
    Python Programming
    emp.groupby('job').filter(lambda x: x['sal'].sum() > 5000).groupby('job')['sal'].sum()

     

    Results
    job
    ANALYST     6000
    MANAGER     8275
    SALESMAN    5600
    Name: sal, dtype: int64

     


    3. R Programming (R Package)

    • Subset() 함수
    R Programming
    %%R
    
    subset( aggregate(sal ~ job, data=emp ,sum) , sal > 5000 )

     

    Results
           job  sal
    1  ANALYST 6000
    3  MANAGER 8275
    5 SALESMAN 5600

     


    • [참고] 집계함수 - Aggregate() 함수
    R Programming
    %%R
    
    aggregate(emp$sal,list(v1=emp$job),sum)

     

    Results
             v1 sum.emp$sal
    1   ANALYST        6000
    2     CLERK        4150
    3   MANAGER        8275
    4 PRESIDENT        5000
    5  SALESMAN        5600

     


    4. R Dplyr Package

    • Filter() 함수
    R Programming
    %%R
    
    emp %>%
      group_by(job) %>%
      summarise(sum_sal = sum(sal, na.rm = TRUE)) %>%
      filter(sum_sal > 5000)

     

    Results
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 2
      job      sum_sal
      <chr>      <dbl>
    1 ANALYST     6000
    2 MANAGER     8275
    3 SALESMAN    5600

     


    5. R sqldf Package

    • Having() 절
    R Programming
    %%R
    sqldf(" select job, sum(sal) sal_sum from emp group by job having sum(sal)>5000 ")

     

    Results
           job sal_sum
    1  ANALYST    6000
    2  MANAGER    8275
    3 SALESMAN    5600

     


    6. Python pandasql Package

    • Having() 절
    Python Programming
    ps.sqldf("select job, sum(sal) sal_sum from emp group by job having sum(sal)>5000")

     

    Results
      job sal_sum
    0 ANALYST 6000
    1 MANAGER 8275
    2 SALESMAN 5600

     


    7. R data.table Package

    • .SD
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, .SD[sum(sal, na.rm = TRUE) > 5000, .('sum_sal' = sum(sal,na.rm = TRUE))], keyby = .(job)]

     

    Results
            job sum_sal
    1:  ANALYST    6000
    2:  MANAGER    8275
    3: SALESMAN    5600

     


    8. SAS Proc SQL

    • Having() 절
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select job, sum(sal) AS SAL_SUM 
        from   emp 
        group 
           by  job 
        having sum(sal) > 5000;
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS job SAL_SUM
    1 ANALYST 6000
    2 MANAGER 8275
    3 SALESMAN 5600

     


    9. SAS Data Step

    • Where() 구문
    SAS Programming
    %%SAS sas
    
    PROC SUMMARY DATA=EMP NWAY;
         CLASS job;
         VAR   SAL;
         OUTPUT OUT=STATSAS_2(DROP=_: WHERE=(SAL_SUM > 5000)) SUM=SAL_SUM;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS job SAL_SUM
    1 ANALYST 6000
    2 MANAGER 8275
    3 SALESMAN 5600

     


    • Where() 구문
    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=EMP_1;
         BY job;
    RUN;
    
    DATA STATSAS_3(WHERE=(sal_SUM > 5000));
     SET EMP_1;
         BY job;
    
         IF FIRST.job THEN sal_SUM = SAL;
         ELSE              sal_SUM + SAL;  * SAL_SUM = SUM_SUL + SAL;
    
         IF LAST.job;                      * IF LAST.JOB THEN OUTPUT STATSAS_3;
         KEEP job sal_SUM;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS job sal_SUM
    1 ANALYST 6000
    2 MANAGER 8275
    3 SALESMAN 5600

     


    10. Python Dfply Package

    • Filter_by() 함수
    Python Programming
    emp >> group_by('job') >> summarize( sal_tot = X.sal.sum() ) >> filter_by(X.sal_tot > 5000)

     

    Results
      job sal_tot
    0 ANALYST 6000
    2 MANAGER 8275
    4 SALESMAN 5600

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

    반응형

    댓글