포스팅 목차
42. Display dept numbers and total number of employees within each group.
* 개별 부서에 근무하는 직원의 총 인원수를 출력하시오.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- [그룹별 집계함수] 그룹별 관측치 및 데이터 건수 집계(건수 카운트)
- [데이터 관리] 8. By 또는 Split 파일 프로세싱 : [링크]
|
1. 오라클(Oracle)
- Group by 와 count(*) 함수
Oracle Programming |
select deptno,count(*) as emp_cnt
from emp
group
by deptno
2. 파이썬(Pandas)
- Groupby 와 집계 함수
Python Programming |
emp.groupby(emp['deptno'])['deptno'].describe()[['count', 'mean']].reset_index()
Results |
deptno | count | mean | |
0 | 10 | 3.0 | 10.0 |
1 | 20 | 5.0 | 20.0 |
2 | 30 | 6.0 | 30.0 |
- Groupby 와 size() 함수
Python Programming |
emp.groupby(['deptno']).size().reset_index(name='counts')
Results |
deptno | counts | |
0 | 10 | 3 |
1 | 20 | 5 |
2 | 30 | 6 |
- Groupby 와 집계 함수(agg 함수)
Python Programming |
emp[['deptno']].groupby(emp['deptno']).agg(['count'])
Results |
deptno | |
count | |
--- | --- |
deptno | |
--- | --- |
10 | 3 |
20 | 5 |
30 | 6 |
- Groupby 와 집계 함수(agg 함수)
Python Programming |
emp.groupby('deptno').agg({'deptno':pd.Series.nunique, 'deptno':['count']})
Results |
deptno | |
count | |
--- | --- |
deptno | |
--- | --- |
10 | 3 |
20 | 5 |
30 | 6 |
- value_counts (pandas=1.1 이후 버젼에서 사용가능)
Python Programming |
emp.value_counts(subset=['deptno'])
Results |
deptno
30 6
20 5
10 3
dtype: int64
- Groupby 와 summary_cont 함수
Python Programming |
rp.summary_cont(emp.groupby(['deptno'])[['deptno']])
Results |
C:\Users\BACK\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:2003: RuntimeWarning: invalid value encountered in multiply
lower_bound = _a * scale + loc
C:\Users\BACK\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:2004: RuntimeWarning: invalid value encountered in multiply
upper_bound = _b * scale + loc
deptno | |
N | |
--- | --- |
deptno | |
--- | --- |
10 | 3 |
20 | 5 |
30 | 6 |
3. R Programming (R Package)
- by 와 집계 함수(Aggregate 함수)
R Programming |
%%R
aggregate(emp$empno, by=list(emp$deptno), FUN=length)
Results |
Group.1 length
1 10 3
2 20 5
3 30 6
- Aggregate 함수
R Programming |
%%R
aggregate(empno~deptno,FUN=length,data=emp)
Results |
deptno empno
1 10 3
2 20 5
3 30 6
- 집계 함수(Aggregate 함수) 와 length() 함수
R Programming |
%%R
aggregate(empno ~ deptno, data = emp, FUN = function(x) c(max = length(x) ) )
Results |
deptno empno
1 10 3
2 20 5
3 30 6
- table() 함수의 리스트
R Programming |
%%R
table(emp$deptno)
Results |
10 20 30
3 5 6
- tapply 함수와 length 연산
R Programming |
%%R
tapply(emp$deptno, emp$deptno, length)
Results |
10 20 30
3 5 6
4. R Dplyr Package
- group_by 와 count() 함수
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::count()
Results |
# A tibble: 3 x 2
# Groups: deptno [3]
deptno n
<dbl> <int>
1 10 3
2 20 5
3 30 6
5. R sqldf Package
- group by 와 count() 함수
R Programming |
%%R
sqldf("select deptno,count(*) emp_cnt
from emp
group by deptno")
Results |
deptno emp_cnt
1 10 3
2 20 5
3 30 6
6. Python pandasql Package
- group by 와 count() 함수
Python Programming |
ps.sqldf("select deptno,count(*) emp_cnt from emp group by deptno")
Results |
deptno | emp_cnt | |
0 | 10 | 3 |
1 | 20 | 5 |
2 | 30 | 6 |
7. R data.table Package
- keyby= 와 .N 연산
R Programming |
%%R
DT <- data.table(emp)
DT[, .('count' = .N), keyby = list(deptno)]
Results |
deptno count
1: 10 3
2: 20 5
3: 30 6
8. SAS Proc SQL
- group by 와 count() 함수
SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
SELECT deptno,count(*) AS emp_cnt
from emp
group
by 1;
QUIT;
PROC PRINT data=STATSAS_1(obs=3);RUN;
Results |
OBS | deptno | emp_cnt |
1 | 10 | 3 |
2 | 20 | 5 |
3 | 30 | 6 |
9. SAS Data Step
- Proc summary 프로시져의 N= 연산
SAS Programming |
%%SAS sas
PROC SUMMARY DATA=EMP NWAY;
CLASS DEPTNO;
VAR SAL;
OUTPUT OUT=EMP_COUNT(DROP=_:) N=emp_cnt;
RUN;
PROC PRINT;RUN;
Results |
OBS | deptno | emp_cnt |
1 | 10 | 3 |
2 | 20 | 5 |
3 | 30 | 6 |
10. Python Dfply Package
- group_by 와 집계 함수(Summarize 함수)
Python Programming |
emp >> group_by('deptno') >> summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) )
Results |
deptno | emp_cnt | emp_cnt1 | emp_cnt2 | |
0 | 10 | 3 | 3 | 3 |
1 | 20 | 5 | 5 | 5 |
2 | 30 | 6 | 6 | 6 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트
반응형
댓글