본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[기초 통계량 - 그룹별 총건수(Count)] 그룹별 건수 집계 - 43 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 10.

포스팅 목차

     

    43. Display the various jobs and total number of employees with each job group.

     

    * 개별 직무에 속한 직원의 총 인원수를 출력하시오.


    • Oracle : group by, count(*)
    • 파이썬 Pandas : groupby(), describe(), reset_index(), size(), agg(), pd.series.nunique, value_counts(), subset=, rp.summary_cont(), groupby
    • R 프로그래밍 : aggregate(), length, function, table, tapply
    • R Dplyr Package : dplyr::group_by(), dplyr::count(), tally()
    • R sqldf Package : group by, count(*)
    • Python pandasql Package : group by, count(*)
    • R data.table Package : .N, keyby=
    • SAS Proc SQL : group by, count(*)
    • SAS Data Step : proc summary, N=, first., last.
    • Python Dfply Package : group_by(), summarize(), .count(), .nunique(), .n()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • Count() 함수
    Oracle Programming
    select job, count(*) as emp_cnt
    from   emp 
    group 
       by  job;

     


    2. Python Pandas(파이썬)

    • value_counts (pandas=1.1 이후 버젼에서 사용가능)
    Python Programming
    emp.value_counts(subset=['job']).reset_index(name='counts')

     

    Results
      job counts
    0 SALESMAN 4
    1 CLERK 4
    2 MANAGER 3
    3 ANALYST 2
    4 PRESIDENT 1

     


    3. R Programming (R Package)

    • Aggregate() 함수 와 length
    R Programming
    %%R
    
    aggregate(empno~job,FUN=length,data=emp)

     

    Results
            job empno
    1   ANALYST     2
    2     CLERK     4
    3   MANAGER     3
    4 PRESIDENT     1
    5  SALESMAN     4

     


    4. R Dplyr Package

    • Count() 함수
    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(job) %>% 
      dplyr::count( name = 'emp_cnt' )

     

    Results
    # A tibble: 5 x 2
    # Groups:   job [5]
      job       emp_cnt
      <chr>       <int>
    1 ANALYST         2
    2 CLERK           4
    3 MANAGER         3
    4 PRESIDENT       1
    5 SALESMAN        4

     


    5. R sqldf Package

    • Count() 함수
    R Programming
    %%R
    sqldf(" select job, count(*) emp_cnt 
            from emp 
            group by job")

     

    Results
            job emp_cnt
    1   ANALYST       2
    2     CLERK       4
    3   MANAGER       3
    4 PRESIDENT       1
    5  SALESMAN       4

     


    6. Python pandasql Package

    • Count() 함수
    Python Programming
    ps.sqldf("select job, count(*) from emp group by job")

     

    Results
      job count(*)
    0 ANALYST 2
    1 CLERK 4
    2 MANAGER 3
    3 PRESIDENT 1
    4 SALESMAN 4

     


    7. R data.table Package

    • .N 함수
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, .N, by = list(job)]

     

    Results
             job N
    1:     CLERK 4
    2:  SALESMAN 4
    3:   MANAGER 3
    4:   ANALYST 2
    5: PRESIDENT 1

     


    8. SAS Proc SQL

    • Count() 함수
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select job, count(*) as emp_cnt
        from   emp 
        group 
           by  job;
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS job emp_cnt
    1 ANALYST 2
    2 CLERK 4
    3 MANAGER 3
    4 PRESIDEN 1
    5 SALESMAN 4

     


    9. SAS Data Step

    • Proc summary 프로시져와 N
    SAS Programming
    %%SAS sas
    
    PROC SUMMARY DATA=EMP NWAY;
         CLASS job;
         VAR   SAL;
         OUTPUT OUT=EMP_COUNT(DROP=_:) N=emp_cnt;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS job emp_cnt
    1 ANALYST 2
    2 CLERK 4
    3 MANAGER 3
    4 PRESIDEN 1
    5 SALESMAN 4

     


     

    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=EMP_1;
         BY job;
    RUN;
    
    DATA emp_cnt;
     SET EMP_1;
         BY job;
    
         IF FIRST.job THEN emp_cnt = 1;
         ELSE emp_cnt + 1;
    
         IF LAST.job THEN OUTPUT emp_cnt;
         KEEP DEPTNO emp_cnt;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS deptno emp_cnt
    1 20 2
    2 10 4
    3 10 3
    4 10 1
    5 30 4

     


    10. Python Dfply Package

     

    • Summarize() 함수 와 count() / Nunique() / N()
    Python Programming
    emp >> group_by('job') >> summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) )

     

    Results
      job emp_cnt emp_cnt1 emp_cnt2
    0 ANALYST 2 2 2
    1 CLERK 4 4 4
    2 MANAGER 3 3 3
    3 PRESIDENT 1 1 1
    4 SALESMAN 4 4 4

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글