포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
MIN Oracle Function
MIN함수는 (개별 그룹내의) 사용자가 지정한 인수의 최소값을 반환한다. 집계 함수, 분석 함수 모두 이용 가능하다.
- 함수설명 : MIN 오라클 함수 링크
1. Oracle(오라클)
- Min 함수
다음 예제는 employees 테이블의 개별 부서내에서 최소 급여를 계산한다.
Oracle Programming |
SELECT deptno,MIN(sal) sal_min
FROM emp
group by deptno
Results |
DEPTNO SAL_MIN
------------------
30 950
10 1300
20 800
2. Python Pandas(파이썬)
1) 집계함수(Aggregate function)
- Min 함수
Python Programming |
emp.groupby('deptno')['sal'].min()
Results |
deptno
10 1300
20 800
30 950
Name: sal, dtype: int64
- Apply()와 Min 함수
Python Programming |
emp.groupby('deptno')['sal'].apply(min)
Results |
deptno
10 1300
20 800
30 950
Name: sal, dtype: int64
- Rank() 함수
Python Programming |
emp.groupby('deptno').apply(lambda x : x[ x['sal'].rank() ==1 ])
Results |
empno ename job mgr hiredate sal comm deptno
deptno
10 13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10
20 0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20
30 11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30
2) 분석함수(Analytic Function)
- groupby()와 Min 함수
Python Programming |
withmooc = emp.copy()
group_min = withmooc.groupby('deptno')['sal'].min()
withmooc['sal_min'] = withmooc['deptno'].map(group_min)
withmooc.sort_values('deptno').head()
Results |
empno ename job mgr hiredate sal comm deptno sal_min
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1300
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1300
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1300
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 800
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 800
- Transform()과 Min 함수
Python Programming |
withmooc = emp.copy()
withmooc['min_value'] = emp.groupby('deptno')['sal'].transform(lambda x:x.min())
withmooc.sort_values('deptno').head()
Results |
empno ename job mgr hiredate sal comm deptno min_value
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1300
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1300
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1300
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 800
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 800
- Transform()과 Min 함수
Python Programming |
withmooc = emp.copy()
withmooc['min_value'] = emp.groupby('deptno')['sal'].transform('min')
withmooc.sort_values('deptno').head()
Results |
empno ename job mgr hiredate sal comm deptno min_value
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1300
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1300
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1300
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 800
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 800
3. R Programming (R Package)
1) 집계함수 (Aggregate function)
- Aggregate()와 Min
R Programming |
%%R
aggregate(sal ~ deptno, data=emp , FUN=min)
Results |
deptno sal
1 10 1300
2 20 800
3 30 950
- by()와 Min
R Programming |
%%R
unlist(with(emp, by(sal, deptno, min)))
Results |
deptno: 10
[1] 1300
------------------------------------------------------------
deptno: 20
[1] 800
------------------------------------------------------------
deptno: 30
[1] 950
- Map과 Min
R Programming |
%%R
do.call(rbind, Map(min, split(emp$sal, emp$deptno)))
Results |
[,1]
10 1300
20 800
30 950
- purrr::map_dfr()과 min
R Programming |
%%R
# purrr::map_chr(split(emp,emp$deptno), ~min(.x$sal))
purrr::map_dfr(split(emp,emp$deptno), ~min(.x$sal))
Results |
# A tibble: 1 x 3
`10` `20` `30`
<dbl> <dbl> <dbl>
1 1300 800 950
- purrr::map_dfr()과 min
R Programming |
%%R
do.call(rbind, purrr::map_dfr(split(emp,emp$deptno), ~min(.x$sal)) )
Results |
[,1]
10 1300
20 800
30 950
- lapply()와 Min
R Programming |
%%R
do.call(rbind, lapply(split(emp$sal, emp$deptno), min))
Results |
[,1]
10 1300
20 800
30 950
2) 분석함수(Analytic Function)
- ave()와 min
R Programming |
%%R
withmooc <- emp
withmooc['sal_min'] = with(withmooc, ave(x = sal, deptno, FUN=min)) # FUN =function(x) { median(x) }
withmooc[order(withmooc$deptno),][1:10,]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_min
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 800
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 800
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 950
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 950
- ave()와 Min
R Programming |
%%R
withmooc <- emp
withmooc['sal_median'] = ave(x = withmooc$sal, withmooc$deptno, FUN = min) # FUN = function (x) { min(x) }
withmooc[order(withmooc$deptno),][1:10,]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_median
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 800
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 800
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 950
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 950
- purrr::map_chr()과 min
R Programming |
%%R
withmooc <- emp
Min_var <- with(withmooc, purrr::map_chr(base::split(withmooc,withmooc$deptno), ~min(.x$sal)) )
data.frame(withmooc, Min_value =rep(Min_var, table(withmooc$deptno)))[1:10,]
Results |
empno ename job mgr hiredate sal comm deptno Min_value
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1300.000000
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1300.000000
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1300.000000
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800.000000
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 800.000000
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 800.000000
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 800.000000
8 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800.000000
9 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 950.000000
10 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 950.000000
- tapply와 min
R Programming |
%%R
withmooc <- emp
Min_var <- with(withmooc, tapply(sal, deptno, min))
data.frame(emp, min_value =rep(Min_var, table(withmooc$deptno)))
# with(withmooc, data.frame(emp, min_value=rep(tapply(sal, deptno, min), table(deptno))) )[1:10,]
Results |
empno ename job mgr hiredate sal comm deptno min_value
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1300
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1300
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1300
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 800
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 800
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 800
8 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
9 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 950
10 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 950
11 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 950
12 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 950
13 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 950
14 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
- plyr::ddply()와 Min
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
plyr::ddply(emp,.(deptno),transform,sal_min = min(sal))[1:10,]
Results |
empno ename job mgr hiredate sal comm deptno sal_min
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 800
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 800
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 950
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 950
- purrr::map_dfc()와 min
R Programming |
%%R
withmooc <- emp
mapping <- t( purrr::map_dfc( split(withmooc,withmooc$deptno), ~min(.x$sal) ) )
trans_data <- melt(mapping, value.name="sal_min")
merge(emp,trans_data,by.x='deptno',by.y='Var1')[1:10,]
Results |
deptno empno ename job mgr hiredate sal comm Var2 sal_min
1 10 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 1 1300
2 10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 1 1300
3 10 7934 MILLER CLERK 7782 1982-01-23 1300 NA 1 1300
4 20 7369 SMITH CLERK 7902 1980-12-17 800 NA 1 800
5 20 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 1 800
6 20 7566 JONES MANAGER 7839 1981-04-02 2975 NA 1 800
7 20 7902 FORD ANALYST 7566 1981-12-03 3000 NA 1 800
8 20 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 1 800
9 30 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 1 950
10 30 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 1 950
4. R Dplyr Package
1) 집계함수(Aggregate function)
- dplyr::summarize()와 Min
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::summarize(sal_min = min(sal)) %>%
head()
Results |
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
deptno sal_min
<dbl> <dbl>
1 10 1300
2 20 800
3 30 950
- map()과 min
R Programming |
%%R
emp %>%
tbl_df() %>%
nest(-deptno) %>%
mutate(sal_min = map(data, ~ min(.$sal))) %>%
unnest(sal_min) %>%
head()
Results |
# A tibble: 3 x 3
deptno data sal_min
<dbl> <list> <dbl>
1 20 <tibble [5 x 7]> 800
2 30 <tibble [6 x 7]> 950
3 10 <tibble [3 x 7]> 1300
2) 분석함수(Analytic Function)
- min() 함수
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_min = min(sal)) %>%
dplyr::arrange(deptno) %>%
head()
Results |
# A tibble: 6 x 9
# Groups: deptno [2]
empno ename job mgr hiredate sal comm deptno sal_min
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
- purrr:::map_dfr()과 min()
R Programming |
%%R
withmooc <- emp %>% mutate(deptno = as.character(deptno))
withmooc %>%
dplyr::left_join( ( purrr::map_dfr(split(withmooc,withmooc$deptno), ~min(.x$sal)) %>%
gather(key = deptno, value = sal_min )
),
id = "deptno"
) %>%
dplyr::arrange(deptno) %>%
head()
Results |
Joining, by = "deptno"
# A tibble: 6 x 9
empno ename job mgr hiredate sal comm deptno sal_min
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <chr> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
5. R sqldf Package
1) 집계함수 (Aggregate function)
- Min 함수
R Programming |
%%R
sqldf(" SELECT deptno,MIN(sal) sal_min FROM emp group by deptno; ")
Results |
deptno sal_min
1 10 1300
2 20 800
3 30 950
6. Python pandasql Package
1) 집계함수 (Aggregate function)
- Min 함수
Python Programming |
ps.sqldf(" SELECT deptno,MIN(sal) sal_min FROM emp group by deptno; ")
Results |
deptno sal_min
0 10 1300
1 20 800
2 30 950
7. R data.table Package
1) 집계함수 (Aggregate function)
- Min 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, min(sal), by =deptno]
Results |
deptno V1
1: 20 800
2: 30 950
3: 10 1300
2) 분석함수 (Analytic Function)
- Min 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_min := min(sal), by = deptno][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_min
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 800
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 800
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 950
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 950
- Min 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, `:=` ( sal_min = min(unlist(.SD)) ),
by = deptno,
.SDcols = c("sal")][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_min
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1300
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1300
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1300
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 800
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 800
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 800
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 800
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 800
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 950
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 950
8. Python Duckdb의 SQL
1) 집계함수 (Aggregate function)
Python Programming |
%%sql
SELECT deptno,
MIN(sal) sal_min
FROM emp
group by deptno
Python Programming |
duckdb.sql(" SELECT deptno, \
MIN(sal) sal_min \
FROM emp \
group by deptno ").df()
Results |
deptno sal_min
0 20 800
1 30 950
2 10 1300
2) 분석함수 (Analytic Function)
Python Programming |
%%sql
SELECT MGR, HIREDATE, SAL, EMPNO, ENAME,
MIN(SAL) OVER (PARTITION BY MGR ORDER BY HIREDATE) MGR_MIN
FROM emp
ORDER BY MGR, HIREDATE
LIMIT 7
Python Programming |
duckdb.sql(" SELECT MGR, HIREDATE, SAL, EMPNO, ENAME, \
MIN(SAL) OVER (PARTITION BY MGR ORDER BY HIREDATE) MGR_MIN \
FROM emp \
ORDER BY MGR, HIREDATE \
LIMIT 7 ").df()
Results |
mgr hiredate sal empno ename MGR_MIN
0 7566.0 1981/12/03 3000 7902 FORD 3000
1 7566.0 1982/12/09 3000 7788 SCOTT 3000
2 7698.0 1981/02/20 1600 7499 ALLEN 1600
3 7698.0 1981/02/22 1250 7521 WARD 1250
4 7698.0 1981/09/08 1500 7844 TURNER 1250
5 7698.0 1981/09/28 1250 7654 MARTIN 1250
6 7698.0 1981/12/03 950 7900 JAMES 950
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글