본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[Having 절 - 그룹핑 결과 연산 작업 수행] 그룹별 집계 후 조건절에 만족하는 그룹 선택 - 48 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 15.

포스팅 목차

     

    48. Display the department numbers with more than three employees in each dept.

     

    * 개별 부서에 속한 직원의 숫자가 3명 이상인 부서만 출력하라.


    • Oracle : group by, count(*), having
    • 파이썬 Pandas : groupby(), .count(), .query(), .count(), .loc, 슬라이싱(Slicing), .max()
    • R 프로그래밍 : aggregate(), 사용자 정의 함수, length, function, subset()
    • R Dplyr Package : dplyr::group_by(), dplyr::summarise(), n(), filter()
    • R sqldf Package : group by, count(*), having
    • Python pandasql Package : group by, count(*), having
    • R data.table Package : .N, keyby=. .SD (Subset)
    • SAS Proc SQL : group by, count(*), having
    • SAS Data Step : proc summary, N=, where=, first., last.
    • Python Dfply Package : group_by(), summarize(), .count(), .nunique(), .n(), ungroup(), filter_by()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • Group by 절과 having 절
    Oracle Programming
    select deptno, 
           count(*) emp_cnt
    from   emp 
    group 
       by  deptno 
    having count(*) > 3;

     


    2. Python Pandas(파이썬)

    • Groupby 함수과 Query() 함수
    Python Programming
    emp.groupby('deptno')['sal'].count().reset_index().query('sal > 3')

     

    Results
      deptno sal
    1 20 5
    2 30 6

     


    • Groupby 함수와 관측치 선택
    Python Programming
    emp1 = emp['sal'].groupby(emp['deptno']).count()
    emp1.loc[emp1 > 3,]

     

    Results
    deptno
    20    5
    30    6
    Name: sal, dtype: int64

     


    • [참고] Groupby 함수와 관측치 선택
    Python Programming
    emp1 = emp['sal'].groupby(emp['deptno']).count()
    emp1.loc[emp1==emp1.max(),]

     

    Results
    deptno
    30    6
    Name: sal, dtype: int64

     


    3. R Programming (R Package)

    • 그룹별 집계함수(Aggregate() 함수) 와 subset 함수
    R Programming
    %%R
    
    subset( aggregate(sal ~ deptno, data = emp, FUN = function(x) c(count_sal = length(x) ) ), sal >3 )

     

    Results
      deptno sal
    2     20   5
    3     30   6

     


    4. R Dplyr Package

    • Group_by 함수과 Filter() 함수
    R Programming
    %%R
    
    emp %>%
      group_by(deptno) %>% summarise( n_sal=n() ) %>% filter(n_sal > 3)

     

    Results
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 2 x 2
      deptno n_sal
       <dbl> <int>
    1     20     5
    2     30     6

     


    5. R sqldf Package

    • Group by 절과 having 절
    R Programming
    %%R
    sqldf("select deptno, count(*) from emp group by deptno having count(*)>3")

     

    Results
      deptno count(*)
    1     20        5
    2     30        6

     


    6. Python pandasql Package

    • Group by 절과 having 절
    Python Programming
    ps.sqldf("select deptno, count(*) from emp group by deptno having count(*)>3")

     

    Results
      deptno count(*)
    0 20 5
    1 30 6

     


    7. R data.table Package

    • keyby= 구문과 .SD
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, .SD[.N > 3, .('N_count' = .N)], keyby = .(deptno)]

     

    Results
       deptno N_count
    1:     20       5
    2:     30       6

     


    8. SAS Proc SQL

    • Group by 절과 having 절
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select deptno, count(*) as sal_cnt
        from   emp 
        group 
           by  deptno
        having count(*) > 3;;
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS deptno sal_cnt
    1 20 5
    2 30 6

     


    9. SAS Data Step

    • Proc summary 프로시져와 Where= 인수
    SAS Programming
    %%SAS sas
    
    PROC SUMMARY DATA=EMP NWAY;
         CLASS deptno;
         VAR   SAL;
         OUTPUT OUT=STATSAS_2(DROP=_: WHERE=(sal_cnt>3)) n=sal_cnt;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS deptno sal_cnt
    1 20 5
    2 30 6

     


    • By 구문과 Where= 인수
    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=EMP_1;
         BY deptno;
    RUN;
    
    DATA STATSAS_3(WHERE=(sal_cnt > 3));
     SET EMP_1;
         BY deptno;
    
         IF FIRST.deptno THEN sal_cnt = 1;
         ELSE DO;
            sal_cnt+1;
         END;
    
         IF LAST.deptno THEN OUTPUT STATSAS_3;  * IF LAST.deptno;
         KEEP DEPTNO sal_cnt;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS deptno sal_cnt
    1 20 5
    2 30 6

     


    10. Python Dfply Package

    • Group_by 함수와 fylter_by() 함수
    Python Programming
    emp >>                                                                                           \
      group_by('deptno') >>                                                                          \
      summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) ) >> \
      ungroup() >>                                                                                   \
      filter_by( X.emp_cnt > 3 )

     

    Results
      deptno emp_cnt emp_cnt1 emp_cnt2
    1 20 5 5 5
    2 30 6 6 6

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글