[Having 절 - 그룹핑 결과 연산 작업 수행] 그룹별 집계 후 조건절에 만족하는 그룹 선택

포스팅 목차

48. Display the department numbers with more than three employees in each dept.

* 개별 부서에 속한 직원의 숫자가 3명 이상인 부서만 출력하라.

파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[그룹별 집계함수와 조건식] 그룹별 건수 집계 후 조건절에 만족하는 그룹 선택

Oracle : group by, count(*), having
파이썬 Pandas : groupby(), .count(), .query(), .count(), .loc, 슬라이싱(Slicing), .max()
R 프로그래밍 : aggregate(), 사용자 정의 함수, length, function, subset()
R Dplyr Package : dplyr::group_by(), dplyr::summarise(), n(), filter()
R sqldf Package : group by, count(*), having
Python pandasql Package : group by, count(*), having
R data.table Package : .N, keyby=. .SD (Subset)
SAS Proc SQL : group by, count(*), having
SAS Data Step : proc summary, N=, where=, first., last.
Python Dfply Package : group_by(), summarize(), .count(), .nunique(), .n(), ungroup(), filter_by()
파이썬 Base 프로그래밍 :

1. Oracle(오라클)

Group by 절과 having 절

Oracle Programming

select deptno, 
       count(*) emp_cnt
from   emp 
group 
   by  deptno 
having count(*) > 3;

2. Python Pandas(파이썬)

Groupby 함수과 Query() 함수

Python Programming

emp.groupby('deptno')['sal'].count().reset_index().query('sal > 3')

Results

	deptno	sal
1	20	5
2	30	6

Groupby 함수와 관측치 선택

Python Programming

emp1 = emp['sal'].groupby(emp['deptno']).count()
emp1.loc[emp1 > 3,]

Results

deptno
20    5
30    6
Name: sal, dtype: int64

[참고] Groupby 함수와 관측치 선택

Python Programming

emp1 = emp['sal'].groupby(emp['deptno']).count()
emp1.loc[emp1==emp1.max(),]

Results

deptno
30    6
Name: sal, dtype: int64

3. R Programming (R Package)

그룹별 집계함수(Aggregate() 함수) 와 subset 함수

R Programming

%%R

subset( aggregate(sal ~ deptno, data = emp, FUN = function(x) c(count_sal = length(x) ) ), sal >3 )

Results

  deptno sal
2     20   5
3     30   6

4. R Dplyr Package

Group_by 함수과 Filter() 함수

R Programming

%%R

emp %>%
  group_by(deptno) %>% summarise( n_sal=n() ) %>% filter(n_sal > 3)

Results

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
  deptno n_sal
   <dbl> <int>
1     20     5
2     30     6

5. R sqldf Package

Group by 절과 having 절

R Programming

%%R
sqldf("select deptno, count(*) from emp group by deptno having count(*)>3")

Results

  deptno count(*)
1     20        5
2     30        6

6. Python pandasql Package

Group by 절과 having 절

Python Programming

ps.sqldf("select deptno, count(*) from emp group by deptno having count(*)>3")

Results

	deptno	*count()**
0	20	5
1	30	6

7. R data.table Package

keyby= 구문과 .SD

R Programming

%%R

DT <- data.table(emp)

DT[, .SD[.N > 3, .('N_count' = .N)], keyby = .(deptno)]

Results

   deptno N_count
1:     20       5
2:     30       6

8. SAS Proc SQL

Group by 절과 having 절

SAS Programming

%%SAS sas

PROC SQL;
  CREATE TABLE STATSAS_1 AS
    select deptno, count(*) as sal_cnt
    from   emp 
    group 
       by  deptno
    having count(*) > 3;;
QUIT;
PROC PRINT;RUN;

Results

OBS	deptno	sal_cnt
1	20	5
2	30	6

9. SAS Data Step

Proc summary 프로시져와 Where= 인수

SAS Programming

%%SAS sas

PROC SUMMARY DATA=EMP NWAY;
     CLASS deptno;
     VAR   SAL;
     OUTPUT OUT=STATSAS_2(DROP=_: WHERE=(sal_cnt>3)) n=sal_cnt;
RUN;
PROC PRINT;RUN;

Results

OBS	deptno	sal_cnt
1	20	5
2	30	6

By 구문과 Where= 인수

SAS Programming

%%SAS sas

PROC SORT DATA=EMP OUT=EMP_1;
     BY deptno;
RUN;

DATA STATSAS_3(WHERE=(sal_cnt > 3));
 SET EMP_1;
     BY deptno;

     IF FIRST.deptno THEN sal_cnt = 1;
     ELSE DO;
        sal_cnt+1;
     END;

     IF LAST.deptno THEN OUTPUT STATSAS_3;  * IF LAST.deptno;
     KEEP DEPTNO sal_cnt;
RUN;
PROC PRINT;RUN;

Results

OBS	deptno	sal_cnt
1	20	5
2	30	6

10. Python Dfply Package

Group_by 함수와 fylter_by() 함수

Python Programming

emp >>                                                                                           \
  group_by('deptno') >>                                                                          \
  summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) ) >> \
  ungroup() >>                                                                                   \
  filter_by( X.emp_cnt > 3 )

Results

	deptno	emp_cnt	emp_cnt1	emp_cnt2
1	20	5	5	5
2	30	6	6	6

[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

저작자표시

'통계프로그램 비교 시리즈 > 프로그래밍비교(Oracle,Python,R,SAS)' 카테고리의 다른 글

[having count() 구문] 그룹별 건수 계산 후 계산 값이 특정 값 이상인 경우 - 50 (오라클 SQL, R, Python, SAS) (0)	2021.08.16
[Having 절] 그룹별 합계 계산 후 결과 값이 특정 값 이상인 경우 - 49 (오라클 SQL, R, Python, SAS) (0)	2021.08.16
[기초 통계량 - MIN 함수] 그룹별 최소값 집계 - 47 (오라클 SQL, R, Python, SAS) (0)	2021.08.15
[기초 통계량 - 그룹별 총합계(Total) 계산] 그룹별 합계 집계 - 46 (오라클 SQL, R, Python, SAS) (0)	2021.08.15
[기초 통계량 - 그룹별 MIN/MAX 계산] 그룹별 최대값과 최소값 집계 - 45 (오라클 SQL, R, Python, SAS) (0)	2021.08.10

[Having 절 - 그룹핑 결과 연산 작업 수행] 그룹별 집계 후 조건절에 만족하는 그룹 선택 - 48 (오라클 SQL, R, Python, SAS)

48. Display the department numbers with more than three employees in each dept.

1. Oracle(오라클)

2. Python Pandas(파이썬)

3. R Programming (R Package)

4. R Dplyr Package

5. R sqldf Package

6. Python pandasql Package

7. R data.table Package

8. SAS Proc SQL

9. SAS Data Step

10. Python Dfply Package

'통계프로그램 비교 시리즈 > 프로그래밍비교(Oracle,Python,R,SAS)' 카테고리의 다른 글

댓글

티스토리툴바

[Having 절 - 그룹핑 결과 연산 작업 수행] 그룹별 집계 후 조건절에 만족하는 그룹 선택 - 48 (오라클 SQL, R, Python, SAS)

48. Display the department numbers with more than three employees in each dept.

1. Oracle(오라클)

2. Python Pandas(파이썬)

3. R Programming (R Package)

4. R Dplyr Package

5. R sqldf Package

6. Python pandasql Package

7. R data.table Package

8. SAS Proc SQL

9. SAS Data Step

10. Python Dfply Package

'통계프로그램 비교 시리즈 > 프로그래밍비교(Oracle,Python,R,SAS)' 카테고리의 다른 글

관련글

댓글

티스토리툴바