[기초 통계량 - 총건수(Count) 계산] 그룹별 건수 집계

포스팅 목차

42. Display dept numbers and total number of employees within each group.

* 개별 부서에 근무하는 직원의 총 인원수를 출력하시오.

파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[그룹별 집계함수] 그룹별 관측치 및 데이터 건수 집계(건수 카운트)
[데이터 관리] 8. By 또는 Split 파일 프로세싱 : [링크]

Oracle : count(*)
파이썬 Pandas : groupby(), describe(), reset_index(), size(), agg(), pd.series.nunique, value_counts(), subset=, rp.summary_cont(), groupby
R 프로그래밍 : aggregate(), length, function, table, tapply
R Dplyr Package : dplyr::group_by(), dplyr::count(), tally()
R sqldf Package : group by, count(*)
Python pandasql Package : group by, count(*)
R data.table Package : .N, keyby=
SAS Proc SQL : group by, count(*)
SAS Data Step : proc summary, N=
Python Dfply Package : group_by(), summarize(), .count()
파이썬 Base 프로그래밍 :

1. 오라클(Oracle)

Group by 와 count(*) 함수

Oracle Programming

select deptno,count(*) as emp_cnt
from   emp 
group 
   by  deptno

2. 파이썬(Pandas)

Groupby 와 집계 함수

Python Programming

emp.groupby(emp['deptno'])['deptno'].describe()[['count', 'mean']].reset_index()

Results

	deptno	count	mean
0	10	3.0	10.0
1	20	5.0	20.0
2	30	6.0	30.0

Groupby 와 size() 함수

Python Programming

emp.groupby(['deptno']).size().reset_index(name='counts')

Results

	deptno	counts
0	10	3
1	20	5
2	30	6

Groupby 와 집계 함수(agg 함수)

Python Programming

emp[['deptno']].groupby(emp['deptno']).agg(['count'])

Results

	deptno
	count
---	---
deptno
---	---
10	3
20	5
30	6

Groupby 와 집계 함수(agg 함수)

Python Programming

emp.groupby('deptno').agg({'deptno':pd.Series.nunique, 'deptno':['count']})

Results

	deptno
	count
---	---
deptno
---	---
10	3
20	5
30	6

value_counts (pandas=1.1 이후 버젼에서 사용가능)

Python Programming

emp.value_counts(subset=['deptno'])

Results

deptno
30        6
20        5
10        3
dtype: int64

Groupby 와 summary_cont 함수

Python Programming

rp.summary_cont(emp.groupby(['deptno'])[['deptno']])

Results

C:\Users\BACK\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:2003: RuntimeWarning: invalid value encountered in multiply
  lower_bound = _a * scale + loc
C:\Users\BACK\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:2004: RuntimeWarning: invalid value encountered in multiply
  upper_bound = _b * scale + loc

	deptno
	N
---	---
deptno
---	---
10	3
20	5
30	6

3. R Programming (R Package)

by 와 집계 함수(Aggregate 함수)

R Programming

%%R

aggregate(emp$empno, by=list(emp$deptno), FUN=length)

Results

  Group.1 length
1      10      3
2      20      5
3      30      6

Aggregate 함수

R Programming

%%R

aggregate(empno~deptno,FUN=length,data=emp)

Results

  deptno empno
1     10     3
2     20     5
3     30     6

집계 함수(Aggregate 함수) 와 length() 함수

R Programming

%%R

aggregate(empno ~ deptno, data = emp, FUN = function(x) c(max = length(x) ) )

Results

  deptno empno
1     10     3
2     20     5
3     30     6

table() 함수의 리스트

R Programming

%%R

table(emp$deptno)

Results

10 20 30 
 3  5  6

tapply 함수와 length 연산

R Programming

%%R

tapply(emp$deptno, emp$deptno, length)

Results

10 20 30 
 3  5  6

4. R Dplyr Package

group_by 와 count() 함수

R Programming

%%R

emp %>%
  dplyr::group_by(deptno) %>% 
  dplyr::count()

Results

# A tibble: 3 x 2
# Groups:   deptno [3]
  deptno     n
   <dbl> <int>
1     10     3
2     20     5
3     30     6

5. R sqldf Package

group by 와 count() 함수

R Programming

%%R
sqldf("select deptno,count(*) emp_cnt
       from emp 
       group by deptno")

Results

  deptno emp_cnt
1     10       3
2     20       5
3     30       6

6. Python pandasql Package

group by 와 count() 함수

Python Programming

ps.sqldf("select deptno,count(*) emp_cnt from emp group by deptno")

Results

	deptno	emp_cnt
0	10	3
1	20	5
2	30	6

7. R data.table Package

keyby= 와 .N 연산

R Programming

%%R

DT <- data.table(emp)

DT[, .('count' = .N), keyby = list(deptno)]

Results

   deptno count
1:     10     3
2:     20     5
3:     30     6

8. SAS Proc SQL

group by 와 count() 함수

SAS Programming

%%SAS sas

PROC SQL;
  CREATE TABLE STATSAS_1 AS
    SELECT deptno,count(*) AS emp_cnt
    from   emp 
    group 
       by  1;
QUIT;
PROC PRINT data=STATSAS_1(obs=3);RUN;

Results

OBS	deptno	emp_cnt
1	10	3
2	20	5
3	30	6

9. SAS Data Step

Proc summary 프로시져의 N= 연산

SAS Programming

%%SAS sas

PROC SUMMARY DATA=EMP NWAY;
     CLASS DEPTNO;
     VAR   SAL;
     OUTPUT OUT=EMP_COUNT(DROP=_:) N=emp_cnt;
RUN;
PROC PRINT;RUN;

Results

OBS	deptno	emp_cnt
1	10	3
2	20	5
3	30	6

10. Python Dfply Package

group_by 와 집계 함수(Summarize 함수)

Python Programming

emp >> group_by('deptno') >> summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno)  )

Results

	deptno	emp_cnt	emp_cnt1	emp_cnt2
0	10	3	3	3
1	20	5	5	5
2	30	6	6	6

[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

저작자표시

'통계프로그램 비교 시리즈 > 프로그래밍비교(Oracle,Python,R,SAS)' 카테고리의 다른 글

[기초 통계량 - 그룹별 총합계(Total) 계산] 그룹별 합계 집계 - 44 (오라클 SQL, R, Python, SAS) (0)	2021.08.10
[기초 통계량 - 그룹별 총건수(Count)] 그룹별 건수 집계 - 43 (오라클 SQL, R, Python, SAS) (0)	2021.08.10
[기초 통계량 - 총합계(Total) 계산] 연산작업을 통한 신규변수 생성 - 41 (오라클 SQL, R, Python, SAS) (0)	2021.08.10
[데이터 정렬 - 조건절 기준 정렬] 신규 변수를 기준으로 데이터 내림차순 정렬 - 40 (오라클 SQL, R, Python, SAS) (0)	2021.08.09
[데이터 정렬 - 다중 변수 기준 정렬] 여러 개의 정렬 키를 기준으로 데이터 정렬 - 39 (오라클 SQL, R, Python, SAS) (0)	2021.08.09

[기초 통계량 - 총건수(Count) 계산] 그룹별 건수 집계 - 42 (오라클 SQL, R, Python, SAS)

42. Display dept numbers and total number of employees within each group.

1. 오라클(Oracle)

2. 파이썬(Pandas)

3. R Programming (R Package)

4. R Dplyr Package

5. R sqldf Package

6. Python pandasql Package

7. R data.table Package

8. SAS Proc SQL

9. SAS Data Step

10. Python Dfply Package

'통계프로그램 비교 시리즈 > 프로그래밍비교(Oracle,Python,R,SAS)' 카테고리의 다른 글

댓글

티스토리툴바

[기초 통계량 - 총건수(Count) 계산] 그룹별 건수 집계 - 42 (오라클 SQL, R, Python, SAS)

42. Display dept numbers and total number of employees within each group.

1. 오라클(Oracle)

2. 파이썬(Pandas)

3. R Programming (R Package)

4. R Dplyr Package

5. R sqldf Package

6. Python pandasql Package

7. R data.table Package

8. SAS Proc SQL

9. SAS Data Step

10. Python Dfply Package

'통계프로그램 비교 시리즈 > 프로그래밍비교(Oracle,Python,R,SAS)' 카테고리의 다른 글

관련글

댓글

티스토리툴바