포스팅 목차
43. Display the various jobs and total number of employees with each job group.
* 개별 직무에 속한 직원의 총 인원수를 출력하시오.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- [그룹별 집계함수] 그룹별 건수 집계
- 좀 더 상세한 내용은 42문제 와 아래 내용 참조 - 동일 문제
- [데이터 관리] 8. By 또는 Split 파일 프로세싱 [링크]
- Python stackoverflow : [링크]
- R stackoverflow : [링크]
|
1. Oracle(오라클)
- Count() 함수
Oracle Programming |
select job, count(*) as emp_cnt
from emp
group
by job;
2. Python Pandas(파이썬)
- value_counts (pandas=1.1 이후 버젼에서 사용가능)
Python Programming |
emp.value_counts(subset=['job']).reset_index(name='counts')
Results |
job | counts | |
0 | SALESMAN | 4 |
1 | CLERK | 4 |
2 | MANAGER | 3 |
3 | ANALYST | 2 |
4 | PRESIDENT | 1 |
3. R Programming (R Package)
- Aggregate() 함수 와 length
R Programming |
%%R
aggregate(empno~job,FUN=length,data=emp)
Results |
job empno
1 ANALYST 2
2 CLERK 4
3 MANAGER 3
4 PRESIDENT 1
5 SALESMAN 4
4. R Dplyr Package
- Count() 함수
R Programming |
%%R
emp %>%
dplyr::group_by(job) %>%
dplyr::count( name = 'emp_cnt' )
Results |
# A tibble: 5 x 2
# Groups: job [5]
job emp_cnt
<chr> <int>
1 ANALYST 2
2 CLERK 4
3 MANAGER 3
4 PRESIDENT 1
5 SALESMAN 4
5. R sqldf Package
- Count() 함수
R Programming |
%%R
sqldf(" select job, count(*) emp_cnt
from emp
group by job")
Results |
job emp_cnt
1 ANALYST 2
2 CLERK 4
3 MANAGER 3
4 PRESIDENT 1
5 SALESMAN 4
6. Python pandasql Package
- Count() 함수
Python Programming |
ps.sqldf("select job, count(*) from emp group by job")
Results |
job | count(*) | |
0 | ANALYST | 2 |
1 | CLERK | 4 |
2 | MANAGER | 3 |
3 | PRESIDENT | 1 |
4 | SALESMAN | 4 |
7. R data.table Package
- .N 함수
R Programming |
%%R
DT <- data.table(emp)
DT[, .N, by = list(job)]
Results |
job N
1: CLERK 4
2: SALESMAN 4
3: MANAGER 3
4: ANALYST 2
5: PRESIDENT 1
8. SAS Proc SQL
- Count() 함수
SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
select job, count(*) as emp_cnt
from emp
group
by job;
QUIT;
PROC PRINT;RUN;
Results |
OBS | job | emp_cnt |
1 | ANALYST | 2 |
2 | CLERK | 4 |
3 | MANAGER | 3 |
4 | PRESIDEN | 1 |
5 | SALESMAN | 4 |
9. SAS Data Step
- Proc summary 프로시져와 N
SAS Programming |
%%SAS sas
PROC SUMMARY DATA=EMP NWAY;
CLASS job;
VAR SAL;
OUTPUT OUT=EMP_COUNT(DROP=_:) N=emp_cnt;
RUN;
PROC PRINT;RUN;
Results |
OBS | job | emp_cnt |
1 | ANALYST | 2 |
2 | CLERK | 4 |
3 | MANAGER | 3 |
4 | PRESIDEN | 1 |
5 | SALESMAN | 4 |
SAS Programming |
%%SAS sas
PROC SORT DATA=EMP OUT=EMP_1;
BY job;
RUN;
DATA emp_cnt;
SET EMP_1;
BY job;
IF FIRST.job THEN emp_cnt = 1;
ELSE emp_cnt + 1;
IF LAST.job THEN OUTPUT emp_cnt;
KEEP DEPTNO emp_cnt;
RUN;
PROC PRINT;RUN;
Results |
OBS | deptno | emp_cnt |
1 | 20 | 2 |
2 | 10 | 4 |
3 | 10 | 3 |
4 | 10 | 1 |
5 | 30 | 4 |
10. Python Dfply Package
- Summarize() 함수 와 count() / Nunique() / N()
Python Programming |
emp >> group_by('job') >> summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) )
Results |
job | emp_cnt | emp_cnt1 | emp_cnt2 | |
0 | ANALYST | 2 | 2 | 2 |
1 | CLERK | 4 | 4 | 4 |
2 | MANAGER | 3 | 3 | 3 |
3 | PRESIDENT | 1 | 1 | 1 |
4 | SALESMAN | 4 | 4 | 4 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트
반응형
댓글