본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[having count() 구문] 그룹별 건수 계산 후 계산 값이 특정 값 이상인 경우 - 50 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 16.

포스팅 목차

    50. Display the various jobs along with total number of employees in each job. The output should contain only those jobs with more than three employees.

     

    * 개별 직무에 속한 직원의 숫자가 3명 이상인 부서만 출력하라.


    • Oracle : having count(*) 절
    • 파이썬 Pandas : groupby(), .query(), agg(), filter()
    • R 프로그래밍 : aggregate(), subset(), FUN = function(x) (사용자 정의 함수)
    • R Dplyr Package : group_by(), summarise(), filter(),
    • R sqldf Package : having count(*) 절
    • Python pandasql Package : having count(*) 절
    • R data.table Package : .SD (Subset of Dataframe) , keyby
    • SAS Proc SQL : having count(*) 절
    • SAS Data Step : PROC SUMMARY, where 구문, FIRST.
    • Python Dfply Package : group_by(), summarize(), filter_by(), ungroup()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • having절과 count(*) 함수
    Oracle Programming
    select job, 
           count(*) emp_cnt
    from   emp 
    group 
       by  job 
    having count(*)>3

     


    2. Python Pandas(파이썬)

    • groupby함수와 count(*) 함수
    Python Programming
    emp.groupby('job')['empno'].count().reset_index().query('empno > 3')

     

    Results
      job empno
    1 CLERK 4
    4 SALESMAN 4

    • groupby함수와 agg 함수의 count 인수
    Python Programming
    emp.groupby('job').agg({'empno':'count'}).reset_index().query('empno > 3')

     

    Results
      job empno
    1 CLERK 4
    4 SALESMAN 4

    • groupby함수와 filter 함수
    Python Programming
    emp.groupby('job').filter(lambda x: x['empno'].count() > 3).groupby('job')['empno'].count()

     

    Results
    job
    CLERK       4
    SALESMAN    4
    Name: empno, dtype: int64

     


    3. R Programming (R Package)

    • aggregate함수와 length 함수
    R Programming
    %%R
    
    subset( aggregate(sal ~ job, data = emp, FUN = function(x) c(count_sal = length(x) ) ), sal > 3)

     

    Results
           job sal
    2    CLERK   4
    5 SALESMAN   4

     


    • aggregate함수와 length 함수
    R Programming
    %%R
    
    subset( aggregate(sal ~ job, data = emp, function(x) data.frame(count = length(x))), sal>3)

     

    Results
           job sal
    2    CLERK   4
    5 SALESMAN   4

     


    4. R Dplyr Package

    • Group_by 함수와 Summarise 함수
    R Programming
    %%R
    
    emp %>%
      group_by(job) %>% summarise( n_sal=n() ) %>% filter(n_sal > 3)

     

    Results
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 2 x 2
      job      n_sal
      <chr>    <int>
    1 CLERK        4
    2 SALESMAN     4

     


    5. R sqldf Package

    • having 절과 count(*) 함수
    R Programming
    %%R
    sqldf("select job, count(*) emp_cnt from emp group by job having count(*)>3")

     

    Results
           job emp_cnt
    1    CLERK       4
    2 SALESMAN       4

     


    6. Python pandasql Package

    • having 절과 count(*) 함수
    Python Programming
    ps.sqldf("select job, count(*) emp_cnt from emp group by job having count(*)>3")

     

    Results
      job emp_cnt
    0 CLERK 4
    1 SALESMAN 4

     


    7. R data.table Package

    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, .SD[.N > 3, .('emp_cnt' = .N)], keyby = .(job)]

     

    Results
            job emp_cnt
    1:    CLERK       4
    2: SALESMAN       4

     


    8. SAS Proc SQL

    • having 절과 count(*) 함수
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select job, count(*) AS EMP_CNT
        from   emp 
        group 
           by  1
        having count(*)>3;
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS job EMP_CNT
    1 CLERK 4
    2 SALESMAN 4

     


    9. SAS Data Step

    • Proc Summary 프로시져
    SAS Programming
    %%SAS sas
    
    PROC SUMMARY DATA=EMP NWAY;
         CLASS job;
         VAR   SAL;
         OUTPUT OUT=STATSAS_2(DROP=_: WHERE=(EMP_CNT > 3)) N=EMP_CNT;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS job EMP_CNT
    1 CLERK 4
    2 SALESMAN 4

    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=EMP_1;
         BY job;
    RUN;
    
    DATA STATSAS_3(WHERE=(EMP_CNT > 3));
     SET EMP_1;
         BY job;
    
         IF FIRST.job THEN EMP_CNT = 1;
         ELSE              EMP_CNT + 1;  * EMP_CNT = EMP_CNT + SAL;
    
         IF LAST.job;                      * IF LAST.JOB THEN OUTPUT STATSAS_3;
         KEEP job EMP_CNT;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS job EMP_CNT
    1 CLERK 4
    2 SALESMAN 4

     


    10. Python Dfply Package

    • groupb_by 함수 summarize 함수
    Python Programming
    emp >>                                                                                           \
      group_by('job') >>                                                                             \
      summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) ) >> \
      ungroup() >>                                                                                   \
      filter_by( X.emp_cnt > 3 )

     

    Results
      job emp_cnt emp_cnt1 emp_cnt2
    1 CLERK 4 4 4
    4 SALESMAN 4 4 4

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글