포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ ROW_NUMBER Oracle Function ]
ROW_NUMBER함수는 분석 함수이다. 이 함수는 처리되는 범위(파티션에 포함된 모든 행 또는 쿼리에 의해서 반환되는 모든 행)에서 존재하는 모든 관측치에 대하여 order_by_clause에서 지정된 행의 순서대로 1로 시작하여 각 행에 유일한 (unique) 번호를 부여한다.
- 함수 설명 : ROW_NUMBER 오라클 함수 링크
1. Oracle(오라클)
row_number() 함수
개별 직무 내에 속한 직원의 관측치에 대하여 월급을 기준으로 정렬 후 중복 없이 유일한 번호를 부여하라.
Oracle Programming |
SELECT job, ename, sal, RN
FROM ( SELECT job, ename, sal,
ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
FROM emp
)
WHERE RN <= 1
order by 1
Results |
JOB ENAME SAL RN
--------------------------------------
ANALYST SCOTT 3000 1
CLERK SMITH 800 1
MANAGER CLARK 2450 1
PRESIDENT KING 5000 1
SALESMAN WARD 1250 1
2. Python Pandas(파이썬)
cumcount() 함수
Python Programming |
withmooc = copy.copy(emp)
withmooc['row_num'] = withmooc.sort_values(['sal'], ascending=[True]).groupby('job').cumcount() + 1
withmooc.sort_values(['job','sal']).head(7)
Results |
empno ename job mgr hiredate sal comm deptno row_num
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 1
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 2
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 2
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 3
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 4
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1
rank 함수
- method=’first’ : 동일한 값으로 구성된 관측치에 대하여 데이터 상에서 먼저 반환되는 관측치부터 순위를 부여한다.
Python Programming |
withmooc = copy.copy(emp)
withmooc['sal_rank'] = withmooc.sort_values('sal',ascending = True).groupby('job')['sal'].transform(lambda x: x.rank(method='first',ascending=True) )
withmooc.sort_values(['job','sal']).head()
Results |
empno ename job mgr hiredate sal comm deptno sal_rank
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 1
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 2
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 2
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 3
3. R Programming (R Package)
dplyr::row_number 함수
R Programming |
%%R
withmooc <- emp[order(withmooc$sal),]
withmooc['sal_rank'] = with(withmooc, ave(sal, job, FUN =function(x) { dplyr::row_number(x) }))
withmooc[order(withmooc$job,withmooc$sal),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_rank
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
rank 함수
- method=’first’ : 동일한 값으로 구성된 관측치에 대하여 데이터 상에서 먼저 반환되는 관측치부터 순위를 부여한다.
- na.last = “keep” : 결측치를 원래의 위치에 순위로 NA 값을 그대로 반환한다.
R Programming |
%%R
withmooc <- emp
withmooc['sal_rank'] = with(withmooc, ave(sal, job, FUN =function(x) { rank(x, ties.method = "first", na.last = "keep") }))
withmooc[order(withmooc$job,withmooc$sal),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_rank
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
4. R Dplyr Package
row_number() 함수
R Programming |
%%R
emp %>%
dplyr::arrange(sal) %>%
dplyr::group_by(job) %>%
dplyr::mutate(row_num = row_number(sal)) %>%
dplyr::arrange(job,sal) %>%
head()
Results |
# A tibble: 6 x 9
# Groups: job [2]
empno ename job mgr hiredate sal comm deptno row_num
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <int>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
5. R sqldf Package
ROW_NUMBER 함수
R Programming |
%%R
sqldf(" SELECT job, ename, sal,
ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
FROM emp ")[1:10, ]
Results |
job ename sal RN
1 ANALYST SCOTT 3000 1
2 ANALYST FORD 3000 2
3 CLERK SMITH 800 1
4 CLERK JAMES 950 2
5 CLERK ADAMS 1100 3
6 CLERK MILLER 1300 4
7 MANAGER CLARK 2450 1
8 MANAGER BLAKE 2850 2
9 MANAGER JONES 2975 3
10 PRESIDENT KING 5000 1
row_number() 함수
개별 직무 내에서 최소 급여를 받는 사원의 명단을 출력하시오.
R Programming |
%%R
sqldf(" SELECT job, ename, sal,RN
FROM ( SELECT job, ename, sal,
ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
FROM emp
)
WHERE RN <= 1 ")
Results |
job ename sal RN
1 ANALYST SCOTT 3000 1
2 CLERK SMITH 800 1
3 MANAGER CLARK 2450 1
4 PRESIDENT KING 5000 1
5 SALESMAN WARD 1250 1
6. Python pandasql Package
Python Programming
ps.sqldf(" SELECT job, ename, sal,RN \
FROM ( SELECT job, ename, sal, \
ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN \
FROM emp \
) \
WHERE RN <= 1 ")
Results |
job ename sal RN
0 ANALYST SCOTT 3000 1
1 CLERK SMITH 800 1
2 MANAGER CLARK 2450 1
3 PRESIDENT KING 5000 1
4 SALESMAN WARD 1250 1
7. R data.table Package
dplyr::row_number 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(sal),row_num := dplyr::row_number(sal), by=job][order(job,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno row_num
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
개별 직무내에서 데이터를 정렬 후 순위 부여(1:그룹 내 관측치 개수)
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(sal),row_num := 1:.N , by=job][order(job,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno row_num
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
seq_len 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(sal),row_num := seq_len(.N) , by=job][order(job,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno row_num
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
그룹내 부여된 행 번호(row.names)를 사용하여 관측치 번호(순위) 부여
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(sal),row_num := row.names(.SD) , by=job][order(job,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno row_num
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
data.table::frank
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(sal),row_num := data.table::frank(sal,ties.method="first") , by=job][order(job,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno row_num
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 2
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 3
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 4
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
8. Python Duckdb의 SQL
ROW_NUMBER() 함수
Python Programming |
%%sql
SELECT job, ename, sal
FROM ( SELECT job, ename, sal,
ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
FROM emp
)
WHERE RN <= 1
order by 1
Python Programming |
duckdb.sql(" SELECT job, ename, sal \
FROM ( SELECT job, ename, sal, \
ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN \
FROM emp \
) \
WHERE RN <= 1 \
order by 1 ").df()
Results |
Arc_Cosine
0 1.266104
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글