포스팅 목차
43. Display the various jobs and total number of employees with each job group.
* 개별 직무에 속한 직원의 총 인원수를 출력하시오.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- [그룹별 집계함수] 그룹별 건수 집계
- 좀 더 상세한 내용은 42문제 와 아래 내용 참조 - 동일 문제
- [데이터 관리] 8. By 또는 Split 파일 프로세싱 [링크]
- Python stackoverflow : [링크]
- R stackoverflow : [링크]
|
1. Oracle(오라클)
- Count() 함수
| Oracle Programming |
select job, count(*) as emp_cnt
from emp
group
by job;
2. Python Pandas(파이썬)
- value_counts (pandas=1.1 이후 버젼에서 사용가능)
| Python Programming |
emp.value_counts(subset=['job']).reset_index(name='counts')
| Results |
| job | counts | |
| 0 | SALESMAN | 4 |
| 1 | CLERK | 4 |
| 2 | MANAGER | 3 |
| 3 | ANALYST | 2 |
| 4 | PRESIDENT | 1 |
3. R Programming (R Package)
- Aggregate() 함수 와 length
| R Programming |
%%R
aggregate(empno~job,FUN=length,data=emp)
| Results |
job empno
1 ANALYST 2
2 CLERK 4
3 MANAGER 3
4 PRESIDENT 1
5 SALESMAN 4
4. R Dplyr Package
- Count() 함수
| R Programming |
%%R
emp %>%
dplyr::group_by(job) %>%
dplyr::count( name = 'emp_cnt' )
| Results |
# A tibble: 5 x 2
# Groups: job [5]
job emp_cnt
<chr> <int>
1 ANALYST 2
2 CLERK 4
3 MANAGER 3
4 PRESIDENT 1
5 SALESMAN 4
5. R sqldf Package
- Count() 함수
| R Programming |
%%R
sqldf(" select job, count(*) emp_cnt
from emp
group by job")
| Results |
job emp_cnt
1 ANALYST 2
2 CLERK 4
3 MANAGER 3
4 PRESIDENT 1
5 SALESMAN 4
6. Python pandasql Package
- Count() 함수
| Python Programming |
ps.sqldf("select job, count(*) from emp group by job")
| Results |
| job | count(*) | |
| 0 | ANALYST | 2 |
| 1 | CLERK | 4 |
| 2 | MANAGER | 3 |
| 3 | PRESIDENT | 1 |
| 4 | SALESMAN | 4 |
7. R data.table Package
- .N 함수
| R Programming |
%%R
DT <- data.table(emp)
DT[, .N, by = list(job)]
| Results |
job N
1: CLERK 4
2: SALESMAN 4
3: MANAGER 3
4: ANALYST 2
5: PRESIDENT 1
8. SAS Proc SQL
- Count() 함수
| SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
select job, count(*) as emp_cnt
from emp
group
by job;
QUIT;
PROC PRINT;RUN;
| Results |
| OBS | job | emp_cnt |
| 1 | ANALYST | 2 |
| 2 | CLERK | 4 |
| 3 | MANAGER | 3 |
| 4 | PRESIDEN | 1 |
| 5 | SALESMAN | 4 |
9. SAS Data Step
- Proc summary 프로시져와 N
| SAS Programming |
%%SAS sas
PROC SUMMARY DATA=EMP NWAY;
CLASS job;
VAR SAL;
OUTPUT OUT=EMP_COUNT(DROP=_:) N=emp_cnt;
RUN;
PROC PRINT;RUN;
| Results |
| OBS | job | emp_cnt |
| 1 | ANALYST | 2 |
| 2 | CLERK | 4 |
| 3 | MANAGER | 3 |
| 4 | PRESIDEN | 1 |
| 5 | SALESMAN | 4 |
| SAS Programming |
%%SAS sas
PROC SORT DATA=EMP OUT=EMP_1;
BY job;
RUN;
DATA emp_cnt;
SET EMP_1;
BY job;
IF FIRST.job THEN emp_cnt = 1;
ELSE emp_cnt + 1;
IF LAST.job THEN OUTPUT emp_cnt;
KEEP DEPTNO emp_cnt;
RUN;
PROC PRINT;RUN;
| Results |
| OBS | deptno | emp_cnt |
| 1 | 20 | 2 |
| 2 | 10 | 4 |
| 3 | 10 | 3 |
| 4 | 10 | 1 |
| 5 | 30 | 4 |
10. Python Dfply Package
- Summarize() 함수 와 count() / Nunique() / N()
| Python Programming |
emp >> group_by('job') >> summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) )
| Results |
| job | emp_cnt | emp_cnt1 | emp_cnt2 | |
| 0 | ANALYST | 2 | 2 | 2 |
| 1 | CLERK | 4 | 4 | 4 |
| 2 | MANAGER | 3 | 3 | 3 |
| 3 | PRESIDENT | 1 | 1 | 1 |
| 4 | SALESMAN | 4 | 4 | 4 |

[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트
반응형
댓글