포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ FIRST_VALUE Oracle Function ]
FIRST_VALUE함수는 분석 함수이다. 값의 정렬된 세트에서 첫 번째 값을 반환한다. 만약 집합 내에서 첫 번째 값이 NULL이라면, IGNORE NULLS를 지정하지 않은 한 NULL을 반환한다. 이 설정은 데이터 조밀화에 유용하다. 만약 IGNORE NULLS을 지정하면, FIRST_VALUE함수는 집합 내에서 최초 NON_NULL값을 반환하고, 만약 모든 값이 NULL이라면 NULL을 반환한다.
- 함수 설명 : FIRST_VALUE 오라클 함수 링크
- Group By 후 테이블 Re-Merge는 생략
1. Oracle(오라클)
Oracle Programming |
SELECT ENAME,
JOB,
SAL,
FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY
FROM emp
Results |
ENAME JOB SAL FIRST_SALARY
-------------------------------------
FORD ANALYST 3000 FORD
SCOTT ANALYST 3000 FORD
MILLER CLERK 1300 MILLER
ADAMS CLERK 1100 MILLER
JAMES CLERK 950 MILLER
SMITH CLERK 800 MILLER
JONES MANAGER 2975 JONES
BLAKE MANAGER 2850 JONES
CLARK MANAGER 2450 JONES
KING PRESIDENT 5000 KING
ALLEN SALESMAN 1600 ALLEN
TURNER SALESMAN 1500 ALLEN
MARTIN SALESMAN 1250 ALLEN
WARD SALESMAN 1250 ALLEN
2. Python Pandas(파이썬)
- [stackoverflow 참고] Pandas - group by one column, sort by another, get value from the third column [링크]
- [stackoverflow 참고] Apply vs transform on a group object [링크]
Python Programming |
withmooc = emp.copy()
group_first = withmooc.sort_values('sal',ascending = False).groupby('job')['ename'].first()
withmooc['Time'] = withmooc['job'].map(group_first)
withmooc.sort_values('job').head()
Results |
empno ename job mgr hiredate sal comm deptno Time
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 SCOTT
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 SCOTT
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 MILLER
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 MILLER
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 MILLER
Python Programming |
withmooc = emp.copy()
withmooc['first_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform(lambda x:x.iloc[0])
withmooc.sort_values('job').head()
Results |
empno ename job mgr hiredate sal comm deptno first_value
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 SCOTT
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 SCOTT
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 MILLER
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 MILLER
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 MILLER
Python Programming |
withmooc = emp.copy()
withmooc['first_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform('first')
withmooc.sort_values('job').head()
Results |
empno ename job mgr hiredate sal comm deptno first_value
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 SCOTT
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 SCOTT
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 MILLER
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 MILLER
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 MILLER
[참고]
- job 그룹별 ename에 대한 rank
Python Programming |
withmooc = emp.copy()
withmooc.sort_values('sal', ascending=False).groupby('job').apply(lambda x : x[ x['ename'].rank() ==1 ]["ename"])
Results |
job
ANALYST 12 FORD
CLERK 10 ADAMS
MANAGER 5 BLAKE
PRESIDENT 8 KING
SALESMAN 1 ALLEN
Name: ename, dtype: object
Python Programming |
emp.sort_values('sal',ascending = False).groupby('job')['ename'].apply(max)
Results |
job
ANALYST SCOTT
CLERK SMITH
MANAGER JONES
PRESIDENT KING
SALESMAN WARD
Name: ename, dtype: object
3. R Programming (R Package)
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
plyr::ddply(emp,.(job),transform,first_value = dplyr::first(ename, order_by = desc(sal) ))[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno first_value
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 MILLER
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 MILLER
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 MILLER
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 MILLER
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
withmooc['first_value'] = with(withmooc, ave(x = ename, job, FUN =function(x) { dplyr::first(x) }))
# withmooc['first_value'] = with(emp[order(-emp$sal),], ave(x = ename, job, FUN =function(x) { x[1] }))
withmooc[order(withmooc$job),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno first_value
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 MILLER
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 MILLER
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 MILLER
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 MILLER
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
withmooc['first_value'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) { dplyr::first(x) } )
# withmooc['first_value'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) { x[1] } )
withmooc[order(withmooc$job),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno first_value
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 MILLER
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 MILLER
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 MILLER
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 MILLER
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
R Programming |
%%R
withmooc <- emp
# split 되는 그룹내의 관측치 순서 보존을 위하여 데이터와 group 데이터 양쪽 재 정렬 필요
First_var <- with(withmooc, purrr::map_chr(base::split(withmooc[order(-withmooc$sal),],withmooc[order(-withmooc$sal),]$job), ~first(.x$ename)) )
data.frame(withmooc, First_value =rep(First_var, table(withmooc$job)))[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno First_value
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SCOTT
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 SCOTT
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 MILLER
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 MILLER
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 MILLER
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 MILLER
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
8 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 JONES
9 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 JONES
10 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 KING
- [stackoverflow 참고] In R, how do I add a max by group? [duplicate] [링크]
R Programming |
%%R
withmooc <- emp
First_var <- with(withmooc[order(-withmooc$sal),], tapply(ename, job, first)) # head, 1
# First_var <- with(withmooc[order(-withmooc$sal),], tapply(ename, job, head, 1))
data.frame(emp, First_value =rep(First_var, table(withmooc$job)))[1:10, ]
# with(withmooc[order(-withmooc$sal),], data.frame(emp, max.per.group=rep(tapply(ename, job, first), table(job))) )
Results |
empno ename job mgr hiredate sal comm deptno First_value
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SCOTT
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 SCOTT
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 MILLER
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 MILLER
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 MILLER
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 MILLER
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
8 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 JONES
9 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 JONES
10 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 KING
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
mapping <- t( purrr::map_dfc( split(withmooc,withmooc$job), ~first(.x$ename) ) )
trans_data <- melt(mapping, value.name="first_ename")
merge(emp,trans_data,by.x='job',by.y='Var1')[1:10, ]
Results |
job empno ename mgr hiredate sal comm deptno Var2 first_ename
1 ANALYST 7902 FORD 7566 1981-12-03 3000 NA 20 1 SCOTT
2 ANALYST 7788 SCOTT 7566 1982-12-09 3000 NA 20 1 SCOTT
3 CLERK 7369 SMITH 7902 1980-12-17 800 NA 20 1 MILLER
4 CLERK 7934 MILLER 7782 1982-01-23 1300 NA 10 1 MILLER
5 CLERK 7876 ADAMS 7788 1983-01-12 1100 NA 20 1 MILLER
6 CLERK 7900 JAMES 7698 1981-12-03 950 NA 30 1 MILLER
7 MANAGER 7566 JONES 7839 1981-04-02 2975 NA 20 1 JONES
8 MANAGER 7782 CLARK 7839 1981-01-09 2450 NA 10 1 JONES
9 MANAGER 7698 BLAKE 7839 1981-03-01 2850 NA 30 1 JONES
10 PRESIDENT 7839 KING NA 1981-11-17 5000 NA 10 1 KING
[참고] Group By - first
R Programming |
%%R
do.call(rbind, Map(first, split(emp, emp$job)))
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
* <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
2 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
3 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
4 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
5 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
R Programming |
%%R
emp %>%
rownames_to_column %>%
gather(variable, value, -rowname) %>%
spread(variable, value) %>%
head(10)
Results |
# A tibble: 10 x 9
rowname comm deptno empno ename hiredate job mgr sal
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 <NA> 20 7369 SMITH 4003 CLERK 7902 800
2 10 0 30 7844 TURNER 4268 SALESMAN 7698 1500
3 11 <NA> 20 7876 ADAMS 4759 CLERK 7788 1100
4 12 <NA> 30 7900 JAMES 4354 CLERK 7698 950
5 13 <NA> 20 7902 FORD 4354 ANALYST 7566 3000
6 14 <NA> 10 7934 MILLER 4405 CLERK 7782 1300
7 2 300 30 7499 ALLEN 4068 SALESMAN 7698 1600
8 3 500 30 7521 WARD 4070 SALESMAN 7698 1250
9 4 <NA> 20 7566 JONES 4109 MANAGER 7839 2975
10 5 1400 30 7654 MARTIN 4288 SALESMAN 7698 1250
R Programming |
%%R
purrr::map_dfr(split(emp,emp$job), ~first(.x$ename))
Results |
# A tibble: 1 x 5
ANALYST CLERK MANAGER PRESIDENT SALESMAN
<chr> <chr> <chr> <chr> <chr>
1 SCOTT SMITH JONES KING ALLEN
R Programming |
%%R
do.call(rbind, purrr::map_dfr(split(emp,emp$job), ~first(.x$ename)) )
Results |
[,1]
ANALYST "SCOTT"
CLERK "SMITH"
MANAGER "JONES"
PRESIDENT "KING"
SALESMAN "ALLEN"
R Programming |
%%R
emp %>%
tbl_df() %>%
nest(-job) %>%
mutate(Quantiles = map(data, ~ first(.$ename))) %>%
unnest(Quantiles)
Results |
# A tibble: 5 x 3
job data Quantiles
<chr> <list> <chr>
1 CLERK <tibble [4 x 7]> SMITH
2 SALESMAN <tibble [4 x 7]> ALLEN
3 MANAGER <tibble [3 x 7]> JONES
4 ANALYST <tibble [2 x 7]> SCOTT
5 PRESIDENT <tibble [1 x 7]> KING
[참고] Emulate the window function of SQL in R
- [stackoverflow 참고] Emulate the window function of SQL in R [링크]
R Programming |
%%R
withmooc <- emp
unlist(with(withmooc[order(-emp$sal),], by(ename, job, head,1)))
Results |
job: ANALYST
[1] "SCOTT"
------------------------------------------------------------
job: CLERK
[1] "MILLER"
------------------------------------------------------------
job: MANAGER
[1] "JONES"
------------------------------------------------------------
job: PRESIDENT
[1] "KING"
------------------------------------------------------------
job: SALESMAN
[1] "ALLEN"
R Programming |
%%R
aggregate(ename ~ job, data=emp[order(-emp$sal),], FUN=head,1)
Results |
job ename
1 ANALYST SCOTT
2 CLERK MILLER
3 MANAGER JONES
4 PRESIDENT KING
5 SALESMAN ALLEN
R Programming |
%%R
with(emp[order(-emp$sal),], ave(ename, job, FUN=head,1))
Results |
[1] "KING" "SCOTT" "FORD" "JONES" "BLAKE" "CLARK" "ALLEN" "TURNER"
[9] "MILLER" "WARD" "MARTIN" "ADAMS" "JAMES" "SMITH"
[참고] first / Last
- 리스트 반복 작업 purrr 패키지
R Programming |
%%R
purrr::map_chr(split(emp,emp$job), ~first(.x$ename))
Results |
ANALYST CLERK MANAGER PRESIDENT SALESMAN
"SCOTT" "SMITH" "JONES" "KING" "ALLEN"
R Programming |
%%R
withmooc <- emp
do.call(rbind, lapply(split(withmooc[order(withmooc$sal, decreasing=TRUE),], withmooc$job), head, 1))
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
* <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
3 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
4 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
5 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
R Programming |
%%R
do.call(rbind, lapply(split(withmooc[order(withmooc$sal, decreasing=TRUE),], withmooc$job), `[`, 1, ))
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
* <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
3 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
4 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
5 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
4. R Dplyr Package
[참고] dplyr 함수 구문 참고
R Programming |
%%R
dplyr::first
Results |
function (x, order_by = NULL, default = default_missing(x))
{
nth(x, 1L, order_by = order_by, default = default)
}
<bytecode: 0x00000210154b2470>
<environment: namespace:dplyr>
R Programming |
%%R
dplyr::nth
Results |
function (x, n, order_by = NULL, default = default_missing(x))
{
if (length(n) != 1 || !is.numeric(n)) {
abort("`n` must be a single integer.")
}
n <- trunc(n)
if (n == 0 || n > length(x) || n < -length(x)) {
return(default)
}
if (n < 0) {
n <- length(x) + n + 1
}
if (is.null(order_by)) {
x[[n]]
}
else {
x[[order(order_by)[[n]]]]
}
}
<bytecode: 0x00000210103471a8>
<environment: namespace:dplyr>
R Programming |
%%R
emp %>%
dplyr::arrange(-sal) %>%
dplyr::group_by(job) %>%
dplyr::mutate(first_value = dplyr::first(ename, order_by = desc(sal) )) %>%
dplyr::arrange(job) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: job [4]
empno ename job mgr hiredate sal comm deptno first_value
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 MILLER
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 MILLER
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 MILLER
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 MILLER
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
purrr 패키지
R Programming |
%%R
withmooc <- emp[order(withmooc$sal, decreasing=TRUE),]
emp %>%
dplyr::arrange(-sal) %>%
dplyr::left_join( ( purrr::map_dfr(split(withmooc,withmooc$job), ~first(.x$ename)) %>%
gather(key = job, value = first_job )
),
id = "job"
) %>%
dplyr::arrange(job) %>%
head(10)
Results |
Joining, by = "job"
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno first_job
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
[참고] First / Last
R Programming |
%%R
# 단일 건에 대한 중복 처리 필요(PRESIDENT)
emp %>% arrange(desc(sal)) %>% group_by(job) %>% slice(c(1,n()))
Results |
# A tibble: 10 x 8
# Groups: job [5]
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
6 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
7 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
8 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
10 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
R Programming |
%%R
emp %>%
group_by(job) %>%
slice(c(which.min(sal), which.max(sal)))
Results |
# A tibble: 10 x 8
# Groups: job [5]
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
4 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
5 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
7 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
8 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
9 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
10 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
R Programming |
%%R
emp %>%
dplyr::arrange(desc(sal)) %>%
split(f = emp$job) %>%
purrr::map_df(first)
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
3 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
4 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
5 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
5. R sqldf Package
- 직무(Job) 별 급여가 가장 높은 사람
R Programming |
%%R
sqldf(" SELECT ENAME,
JOB,
SAL,
FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY
FROM emp ")[1:10, ]
Results |
ename job sal FIRST_SALARY
1 SCOTT ANALYST 3000 SCOTT
2 FORD ANALYST 3000 SCOTT
3 MILLER CLERK 1300 MILLER
4 ADAMS CLERK 1100 MILLER
5 JAMES CLERK 950 MILLER
6 SMITH CLERK 800 MILLER
7 JONES MANAGER 2975 JONES
8 BLAKE MANAGER 2850 JONES
9 CLARK MANAGER 2450 JONES
10 KING PRESIDENT 5000 KING
6. Python pandasql Package
Python Programming |
ps.sqldf(" SELECT ENAME, \
JOB, \
SAL, \
FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY \
FROM emp ").head()
Results |
ename job sal FIRST_SALARY
0 SCOTT ANALYST 3000 SCOTT
1 FORD ANALYST 3000 SCOTT
2 MILLER CLERK 1300 MILLER
3 ADAMS CLERK 1100 MILLER
4 JAMES CLERK 950 MILLER
7. R data.table Package
R Programming |
%%R
DT <- data.table(emp)
DT[order(-sal) , first_value := first(ename), by = job][order(job),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno first_value
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 MILLER
4: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 MILLER
5: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 MILLER
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 MILLER
7: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
# s = dplyr::first(.SD, order_by = desc(sal) )
DT[, `:=` ( first_value = dplyr::first(ename, order_by = desc(sal) ),
first_value1 = na.omit(unlist(.SD))[1],
last_value = tail(na.omit(unlist(.SD)),1)),
by = job,
.SDcols = c("ename")][order(job),][, c("mgr","sal","comm","depno","job","hiredate") := NULL][1:8, ]
Results |
empno ename deptno first_value first_value1 last_value
1: 7788 SCOTT 20 SCOTT SCOTT FORD
2: 7902 FORD 20 SCOTT SCOTT FORD
3: 7369 SMITH 20 MILLER SMITH MILLER
4: 7876 ADAMS 20 MILLER SMITH MILLER
5: 7900 JAMES 30 MILLER SMITH MILLER
6: 7934 MILLER 10 MILLER SMITH MILLER
7: 7566 JONES 20 JONES JONES CLARK
8: 7698 BLAKE 30 JONES JONES CLARK
R Programming |
%%R
DT <- data.table(emp)
DT[order(-sal) , first_value := xts::first(ename) , by=job][order(job), ][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno first_value
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 SCOTT
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 SCOTT
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 MILLER
4: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 MILLER
5: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 MILLER
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 MILLER
7: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 JONES
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 JONES
9: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 JONES
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
R Programming |
%%R
DT <- data.table(emp)
DT[ order(-sal), `:=` (first_ename = ename[1],
first_ename1 = first(ename),
max_ename = max(ename),
min_ename = min(ename), last_ename=last(ename)),by=job][order(job,-sal),][, c("mgr","sal","comm","depno","job","hiredate") := NULL]
Results |
empno ename deptno first_ename first_ename1 max_ename min_ename last_ename
1: 7788 SCOTT 20 SCOTT SCOTT SCOTT FORD FORD
2: 7902 FORD 20 SCOTT SCOTT SCOTT FORD FORD
3: 7934 MILLER 10 MILLER MILLER SMITH ADAMS SMITH
4: 7876 ADAMS 20 MILLER MILLER SMITH ADAMS SMITH
5: 7900 JAMES 30 MILLER MILLER SMITH ADAMS SMITH
6: 7369 SMITH 20 MILLER MILLER SMITH ADAMS SMITH
7: 7566 JONES 20 JONES JONES JONES BLAKE CLARK
8: 7698 BLAKE 30 JONES JONES JONES BLAKE CLARK
9: 7782 CLARK 10 JONES JONES JONES BLAKE CLARK
10: 7839 KING 10 KING KING KING KING KING
11: 7499 ALLEN 30 ALLEN ALLEN WARD ALLEN MARTIN
12: 7844 TURNER 30 ALLEN ALLEN WARD ALLEN MARTIN
13: 7521 WARD 30 ALLEN ALLEN WARD ALLEN MARTIN
14: 7654 MARTIN 30 ALLEN ALLEN WARD ALLEN MARTIN
[참고] 그룹별 First / Last (단일 건의 중복 문제는 존재)
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[ DT[order(sal, decreasing=TRUE), .I[c(1L,.N)], by=job]$V1 ]
Results |
empno ename job mgr hiredate sal comm deptno
1: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
3: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
4: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
6: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
7: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
8: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
9: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
10: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
8. Python DuckDB의 SQL
Python Programming |
%%sql
SELECT ENAME,
JOB,
SAL,
FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY
FROM emp
LIMIT 6
Python Programming |
duckdb.sql(" SELECT ENAME, \
JOB, \
SAL, \
FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY \
FROM emp \
LIMIT 6 ").df()
Results |
ename job sal FIRST_SALARY
0 SCOTT ANALYST 3000 SCOTT
1 FORD ANALYST 3000 SCOTT
2 ALLEN SALESMAN 1600 ALLEN
3 TURNER SALESMAN 1500 ALLEN
4 WARD SALESMAN 1250 ALLEN
5 MARTIN SALESMAN 1250 ALLEN
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글