본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[기초 통계량 - 총건수(Count) 계산] 그룹별 건수 집계 - 42 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 10.

포스팅 목차

    42. Display dept numbers and total number of employees within each group.

     

    * 개별 부서에 근무하는 직원의 총 인원수를 출력하시오.


    • Oracle : count(*)
    • 파이썬 Pandas : groupby(), describe(), reset_index(), size(), agg(), pd.series.nunique, value_counts(), subset=, rp.summary_cont(), groupby
    • R 프로그래밍 : aggregate(), length, function, table, tapply
    • R Dplyr Package : dplyr::group_by(), dplyr::count(), tally()
    • R sqldf Package : group by, count(*)
    • Python pandasql Package : group by, count(*)
    • R data.table Package : .N, keyby=
    • SAS Proc SQL : group by, count(*)
    • SAS Data Step : proc summary, N=
    • Python Dfply Package : group_by(), summarize(), .count()
    • 파이썬 Base 프로그래밍 :

     


    1. 오라클(Oracle)

    • Group by 와 count(*) 함수
    Oracle Programming
    select deptno,count(*) as emp_cnt
    from   emp 
    group 
       by  deptno

     


    2. 파이썬(Pandas)

    • Groupby 와 집계 함수
    Python Programming
    emp.groupby(emp['deptno'])['deptno'].describe()[['count', 'mean']].reset_index()

     

    Results
      deptno count mean
    0 10 3.0 10.0
    1 20 5.0 20.0
    2 30 6.0 30.0

     


    • Groupby 와 size() 함수
    Python Programming
    emp.groupby(['deptno']).size().reset_index(name='counts')

     

    Results
      deptno counts
    0 10 3
    1 20 5
    2 30 6

     


    • Groupby 와 집계 함수(agg 함수)
    Python Programming
    emp[['deptno']].groupby(emp['deptno']).agg(['count'])

     

    Results
      deptno
      count
    --- ---
    deptno  
    --- ---
    10 3
    20 5
    30 6

     


    • Groupby 와 집계 함수(agg 함수)
    Python Programming
    emp.groupby('deptno').agg({'deptno':pd.Series.nunique, 'deptno':['count']})

     

    Results
      deptno
      count
    --- ---
    deptno  
    --- ---
    10 3
    20 5
    30 6

     


    • value_counts (pandas=1.1 이후 버젼에서 사용가능)
    Python Programming
    emp.value_counts(subset=['deptno'])

     

    Results
    deptno
    30        6
    20        5
    10        3
    dtype: int64

     


    • Groupby 와 summary_cont 함수
    Python Programming
    rp.summary_cont(emp.groupby(['deptno'])[['deptno']])

     

    Results
    C:\Users\BACK\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:2003: RuntimeWarning: invalid value encountered in multiply
      lower_bound = _a * scale + loc
    C:\Users\BACK\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:2004: RuntimeWarning: invalid value encountered in multiply
      upper_bound = _b * scale + loc

     

      deptno
      N
    --- ---
    deptno  
    --- ---
    10 3
    20 5
    30 6

     


    3. R Programming (R Package)

    • by 와 집계 함수(Aggregate 함수)
    R Programming
    %%R
    
    aggregate(emp$empno, by=list(emp$deptno), FUN=length)

     

    Results
      Group.1 length
    1      10      3
    2      20      5
    3      30      6

     


    • Aggregate 함수
    R Programming
    %%R
    
    aggregate(empno~deptno,FUN=length,data=emp)

     

    Results
      deptno empno
    1     10     3
    2     20     5
    3     30     6

     


    • 집계 함수(Aggregate 함수) 와 length() 함수
    R Programming
    %%R
    
    aggregate(empno ~ deptno, data = emp, FUN = function(x) c(max = length(x) ) )

     

    Results
      deptno empno
    1     10     3
    2     20     5
    3     30     6

     


    • table() 함수의 리스트
    R Programming
    %%R
    
    table(emp$deptno)

     

    Results
    10 20 30 
     3  5  6 

     


    • tapply 함수와 length 연산
    R Programming
    %%R
    
    tapply(emp$deptno, emp$deptno, length)

     

    Results
    10 20 30 
     3  5  6 

     


    4. R Dplyr Package

    • group_by 와 count() 함수
    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>% 
      dplyr::count()

     

    Results
    # A tibble: 3 x 2
    # Groups:   deptno [3]
      deptno     n
       <dbl> <int>
    1     10     3
    2     20     5
    3     30     6

     


    5. R sqldf Package

    • group by 와 count() 함수
    R Programming
    %%R
    sqldf("select deptno,count(*) emp_cnt
           from emp 
           group by deptno")

     

    Results
      deptno emp_cnt
    1     10       3
    2     20       5
    3     30       6

     


    6. Python pandasql Package

    • group by 와 count() 함수
    Python Programming
    ps.sqldf("select deptno,count(*) emp_cnt from emp group by deptno")

     

    Results
      deptno emp_cnt
    0 10 3
    1 20 5
    2 30 6

     


    7. R data.table Package

    • keyby= 와 .N 연산
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, .('count' = .N), keyby = list(deptno)]

     

    Results
       deptno count
    1:     10     3
    2:     20     5
    3:     30     6

     


    8. SAS Proc SQL

    • group by 와 count() 함수
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        SELECT deptno,count(*) AS emp_cnt
        from   emp 
        group 
           by  1;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=3);RUN;

     

    Results
    OBS deptno emp_cnt
    1 10 3
    2 20 5
    3 30 6

     


    9. SAS Data Step

    • Proc summary 프로시져의 N= 연산
    SAS Programming
    %%SAS sas
    
    PROC SUMMARY DATA=EMP NWAY;
         CLASS DEPTNO;
         VAR   SAL;
         OUTPUT OUT=EMP_COUNT(DROP=_:) N=emp_cnt;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS deptno emp_cnt
    1 10 3
    2 20 5
    3 30 6

     


    10. Python Dfply Package

    • group_by 와 집계 함수(Summarize 함수)
    Python Programming
    emp >> group_by('deptno') >> summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno)  )

     

    Results
      deptno emp_cnt emp_cnt1 emp_cnt2
    0 10 3 3 3
    1 20 5 5 5
    2 30 6 6 6

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

     

    반응형

    댓글