본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[기초 통계량 - 그룹별 MIN/MAX 계산] 그룹별 최대값과 최소값 집계 - 45 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 10.

포스팅 목차

     

    45. Display department numbers and maximum salary for each department.

     

    * 개별 부서에 근무하는 직원의 최대 급여와 최소 급여를 출력하시오.


    • [그룹별 집계함수] 그룹별 최대값과 최소값 집계
    • R stackoverflow(Apply several summary functions on several variables by group in one call) : [링크]
    • Python stackoverflow(Apply multiple functions to multiple groupby columns) : [링크]
    • Oracle : group by, min(), max()
    • 파이썬 Pandas : groupby(), agg(), reset_index(), np.min, np.max
    • R 프로그래밍 : aggregate(), list, min, max, 사용자 정의 함수, aggregate(), plyr::each()
    • R Dplyr Package : dplyr::group_by(), dplyr::summarise, min(), max()
    • R sqldf Package : group by, min(), max()
    • Python pandasql Package : group by, min(), max()
    • R data.table Package : min(), max(), by=
    • SAS Proc SQL : group by, min(), max()
    • SAS Data Step : proc summary, min=, max()=, first., last.
    • Python Dfply Package : group_by(), summarize(), .min(), .max()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

     

    Max() 함수와 Min() 함수

     

    Oracle Programming
    select deptno, 
           max(sal) max_sal,
           min(sal) min_sal
    from emp 
    group by deptno;

     


    2. Python Pandas(파이썬)

     

    Agg() 함수와 np.max, np.min 인수

     

    Python Programming
    emp['sal'].groupby(emp['deptno']).agg([np.max, np.min]).reset_index()

     

    Results
      deptno amax amin
    0 10 5000 1300
    1 20 3000 800
    2 30 2850 950

     


    Agg() 함수와 max, min 인수

     

    Python Programming
    emp.groupby(['deptno'])['sal'].agg(['max','min']).reset_index()

     

    Results
      deptno max min
    0 10 5000 1300
    1 20 3000 800
    2 30 2850 950

     


    Agg() 함수와 max, min 인수

     

    Python Programming
    emp.groupby('deptno').agg({'sal':['max', 'min']})

     

    Results
      sal
      max
    --- ---
    deptno  
    --- ---
    10 5000
    20 3000
    30 2850

     


    Agg() 함수와 max, min 인수

     

    Python Programming
    emp.groupby('deptno').agg( sal_max  =('sal', 'max'),
                               sal_min  =('sal', 'min'),
                               sal_mean =('sal', 'mean'),
                               sal_sum  =('sal', 'sum'),
                               sal_range=('sal', lambda x: x.max() - x.min())
                             ).reset_index()

     

    Results
      deptno sal_max sal_min sal_mean sal_sum sal_range
    0 10 5000 1300 2916.666667 8750 3700
    1 20 3000 800 2175.000000 10875 2200
    2 30 2850 950 1566.666667 9400 1900

     


    Apply() 함수와 max, min 함수

     

    Python Programming
    emp.groupby('deptno') \
      .apply(lambda x: pd.Series({ 'max_sal'      : x['sal'].max(),
                                   'min_sal'      : x['sal'].min(),
                                   'mean_sal'     : x['sal'].mean(),
                                   'TOT_prodsum'  : (x['sal'] * x['comm']).sum()
                                  })
            ).reset_index()

     

    Results
      deptno max_sal min_sal mean_sal TOT_prodsum
    0 10 5000.0 1300.0 2916.666667 0.0
    1 20 3000.0 800.0 2175.000000 0.0
    2 30 2850.0 950.0 1566.666667 2855000.0

     


    3. R Programming (R Package)

     

    Aggregate() 함수와 max, min 인수

     

    R Programming
    %%R
    library(epiDisplay) # 에러 방지('utf-8' codec can't decode byte 0xc0 in position 46: invalid start byte)
    
    aggregate(emp$sal, by = list(Deptno = emp$deptno), FUN = c("max","min"), "na.rm"=FALSE)

     

    Results
      Deptno max.emp.sal min.emp$sal
    1     10        5000        1300
    2     20        3000         800
    3     30        2850         950

     


    Aggregate() 함수와 max(), min() 함수

     

    R Programming
    %%R
    
    aggregate(sal ~ deptno, data = emp, FUN = function(x) c(max = max(x), min = min(x) ) )

     

    Results
      deptno sal.max sal.min
    1     10    5000    1300
    2     20    3000     800
    3     30    2850     950

     


    Aggregate() 함수와 max, min 함수

     

    R Programming
    %%R
    
    aggregate(sal ~ deptno, data = emp, FUN = plyr::each(max = max, min = min))

     

    Results
      deptno sal.max sal.min
    1     10    5000    1300
    2     20    3000     800
    3     30    2850     950

     


    4. R Dplyr Package

     

    Summarise() 함수와 max, min 함수

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>% 
      dplyr::summarise( sal_mac = max(sal), sal_min = min(sal) )

     

    Results
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 3
      deptno sal_mac sal_min
       <dbl>   <dbl>   <dbl>
    1     10    5000    1300
    2     20    3000     800
    3     30    2850     950

     


    5. R sqldf Package

     

    Max() 함수와 Min() 함수

     

    R Programming
    %%R
    sqldf(" select deptno, max(sal) max_sal,min(sal) min_sal 
            from   emp 
            group by deptno")

     

    Results
      deptno max_sal min_sal
    1     10    5000    1300
    2     20    3000     800
    3     30    2850     950

     


    6. Python pandasql Package

     

    Max() 함수와 Min() 함수

     

    Python Programming
    ps.sqldf("select deptno, max(sal) max_sal,min(sal) min_sal from emp group by deptno")

     

    Results
      deptno max_sal min_sal
    0 10 5000 1300
    1 20 3000 800
    2 30 2850 950

     


    7. R data.table Package

     

    Max() 함수와 Min() 함수

     

    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[ , .(sal_max = max(sal), sal_min = min(sal), sal_cnt = .N), by = .(deptno)] 

     

    Results
       deptno sal_max sal_min sal_cnt
    1:     20    3000     800       5
    2:     30    2850     950       6
    3:     10    5000    1300       3

     


    8. SAS Proc SQL

     

    Max() 함수와 Min() 함수

     

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select deptno, 
               max(sal) AS max_sal,
               min(sal) AS min_sal
        from   emp 
        group 
           by  deptno;
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS deptno max_sal min_sal
    1 10 5000 1300
    2 20 3000 800
    3 30 2850 950

     


    9. SAS Data Step

     

    Proc summary 프로시져와 max, min 인수

     

    SAS Programming
    %%SAS sas
    
    PROC SUMMARY DATA=EMP NWAY;
         CLASS deptno;
         VAR   SAL;
         OUTPUT OUT=STATSAS_2(DROP=_:) MAX=MAX_SAL MIN=MIN_SAL;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS deptno MAX_SAL MIN_SAL
    1 10 5000 1300
    2 20 3000 800
    3 30 2850 950

     


    Max() 함수와 Min() 함수

     

    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=EMP_1;
         BY deptno;
    RUN;
    
    DATA STATSAS_3;
     SET EMP_1;
         BY deptno;
    
         RETAIN MAX_SAL MIN_SAL 0;
    
         IF FIRST.deptno THEN DO;
            MAX_SAL = SAL;
            MIN_SAL = SAL;
         END;
         ELSE DO;
            MAX_SAL =MAX(MAX_SAL,SAL);
            MIN_SAL =MIN(MIN_SAL,SAL);
         END;
    
         IF LAST.deptno THEN OUTPUT STATSAS_3;  * IF LAST.job;
         KEEP DEPTNO MAX_SAL MIN_SAL;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS deptno MAX_SAL MIN_SAL
    1 10 5000 1300
    2 20 3000 800
    3 30 2850 950

     


    10. Python Dfply Package

     

    Summarize() 함수와 max, min 함수

     

    Python Programming
    emp >> group_by('deptno') >> summarize( sal_max = X.sal.max(), sal_min = X.sal.min() )

     

    Results
      deptno sal_max sal_min
    0 10 5000 1300
    1 20 3000 800
    2 30 2850 950

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글