PERCENTILE_DISC 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

포스팅 목차

[ PERCENTILE_DISC Oracle Function ]

PERCENTILE_DISC함수는 이산 분포 모형을 가정하는 역 분포 함수이다. 사용자가 지정한 백분위수에 해당하는 데이터 값을 검색하고, 정확하게 일치하는 데이터 값이 없으면 지정한 백분위 값의 주변 데이터 정렬 옵션에 따라 우선 순위가 빠른 바로 앞의 관측치 값을 반환한다. 계산 과정에서 Null은 무시된다.

함수설명 : PERCENTILE_DISC 오라클 함수 링크
PERCENTILE_CONT는 데이터 세트에 있거나 없을 수 있는 적절한 값을 보간법을 사용하여서 선택하는 반면, PERCENTILE_DISC는 항상 해당 세트에서 백분위수 바로 앞에 실제로 존재하는 값을 반환합니다. 데이터 계수가 짝수인 경우 반환하는 값이 차이가 날 수 있다.

1. Oracle(오라클)

1) 집계함수

PERCENTILE_DISC() 함수

개별 부서 내에서 15% 백분위에 해당하는 급여를 반환한다.

Oracle Programming

SELECT DEPTNO, 
       PERCENTILE_DISC(0.15) WITHIN GROUP (ORDER BY SAL) AS PERCENTILE_DISC 
FROM   EMP 
GROUP  BY DEPTNO

Results

DEPTNO	PERCENTILE_DISC
----------------------------
10	1300
20	800
30	950

2) 분석함수

PERCENTILE_DISC() 함수

파티션에 해당하는 개별 부서 내에서 15% 백분위에 해당하는 급여를 기준 관측치에 반환한다.

Oracle Programming

SELECT DEPTNO, 
       SAL,
       PERCENTILE_DISC(0.15) WITHIN GROUP (ORDER BY SAL) OVER (PARTITION BY DEPTNO) AS PERCENTILE_DISC 
FROM   EMP
ORDER  BY DEPTNO, SAL;

Results

DEPTNO	SAL	PERCENTILE_DISC
--------------------------------
10	1300	1300
10	2450	1300
10	5000	1300
20	800	800
20	1100	800
20	2975	800
20	3000	800
20	3000	800
30	950	950
30	1250	950
30	1250	950
30	1500	950
30	1600	950
30	2850	950

2. Python Pandas(파이썬)

1) 집계함수

np.percentile() 함수

전체 관측치 중에서 백분위 15%에 해당하는 급여를 반환한다.

Python Programming

np.percentile(emp['sal'], 15)

Results

1092.5

np.percentile() 함수

개별 부서 내에서 15% 백분위 수에 해당하는 급여를 반환한다.

Python Programming

withmooc = copy.copy(emp)

withmooc.groupby('deptno')['sal'].apply(lambda x: np.percentile(x,0.15))

Results

deptno
10    1303.45
20     801.80
30     952.25
Name: sal, dtype: float64

2) 분석함수

np.percentile() 함수

Python Programming

withmooc = copy.copy(emp)

withmooc['pct_first'] = withmooc.groupby('deptno')['sal'].transform(lambda x: np.percentile(x, q=15))

withmooc.sort_values(['deptno','sal'],ascending = [True,True]).head(7)

Results

	empno	ename	job		mgr	hiredate	sal	comm	deptno	pct_first
13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1645
6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	1645
8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	1645
0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	980
10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	980
3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	980
7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	980

transform()과 np.percentile() 함수

Python Programming

withmooc = copy.copy(emp)

withmooc['pct_first'] = withmooc.sort_values('sal').groupby('deptno')['sal'].transform(lambda x: np.percentile(x,50))

withmooc.sort_values(['deptno','sal'],ascending = [True,True]).head(7)

Results

	empno	ename	job		mgr	hiredate	sal	comm	deptno	pct_first
13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450
6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450
8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450
0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	2975
10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	2975
3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	2975
7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	2975

scoreatpercentile() 함수

Python Programming

from scipy.stats import tmean, scoreatpercentile

withmooc = copy.copy(emp)

withmooc['pct_first'] = withmooc.groupby('deptno')['sal'].transform(lambda x: scoreatpercentile(x,50))

withmooc.sort_values(['deptno','sal'],ascending = [True,True]).head(7)

Results

	empno	ename	job		mgr	hiredate	sal	comm	deptno	pct_first
13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450
6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450
8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450
0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	2975
10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	2975
3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	2975
7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	2975

3. R Programming (R Package)

dplyr::cume_dist()와 사용자 정의 함수

R Programming

%%R

library(plyr)   # dplyr 보다 먼저 로딩
library(dplyr)

withmooc <- emp[order(emp$sal),]

withmooc['cume_dist'] = ave(x = withmooc[, 'sal'], withmooc[, 'deptno'], FUN = function (x) { dplyr::cume_dist(x) }  )

withmooc <- plyr::ddply(withmooc,.(deptno),transform, PERCENTILE_DISC = min(ifelse(cume_dist >= 0.15, sal , 'NaN')) )
withmooc[order(withmooc$deptno,withmooc$sal),]

withmooc$ename <- withmooc$job <- withmooc$hiredate <- NULL

withmooc[1:10, ]

Results

   empno  mgr  sal comm deptno cume_dist PERCENTILE_DISC
1   7934 7782 1300   NA     10 0.3333333            1300
2   7782 7839 2450   NA     10 0.6666667            1300
3   7839   NA 5000   NA     10 1.0000000            1300
4   7369 7902  800   NA     20 0.2000000             800
5   7876 7788 1100   NA     20 0.4000000             800
6   7566 7839 2975   NA     20 0.6000000             800
7   7788 7566 3000   NA     20 1.0000000             800
8   7902 7566 3000   NA     20 1.0000000             800
9   7900 7698  950   NA     30 0.1666667             950
10  7521 7698 1250  500     30 0.5000000             950

4. R Dplyr Package

cume_dist() 함수

R Programming

%%R

emp %>%
  dplyr::arrange(deptno, sal)  %>%
  dplyr::group_by(deptno) %>%
  dplyr::mutate( sal_cume_dist = cume_dist(sal) ) %>%
  dplyr::mutate( PERCENTILE_DISC = min(sal[sal_cume_dist >= 0.75]) ) %>%
  dplyr::select(-c(hiredate)) %>%
  head(10)

Results

# A tibble: 10 x 9
# Groups:   deptno [3]
   empno ename  job         mgr   sal  comm deptno sal_cume_dist PERCENTILE_DISC
   <dbl> <chr>  <chr>     <dbl> <dbl> <dbl>  <dbl>         <dbl>           <dbl>
 1  7934 MILLER CLERK      7782  1300    NA     10         0.333            5000
 2  7782 CLARK  MANAGER    7839  2450    NA     10         0.667            5000
 3  7839 KING   PRESIDENT    NA  5000    NA     10         1                5000
 4  7369 SMITH  CLERK      7902   800    NA     20         0.2              3000
 5  7876 ADAMS  CLERK      7788  1100    NA     20         0.4              3000
 6  7566 JONES  MANAGER    7839  2975    NA     20         0.6              3000
 7  7788 SCOTT  ANALYST    7566  3000    NA     20         1                3000
 8  7902 FORD   ANALYST    7566  3000    NA     20         1                3000
 9  7900 JAMES  CLERK      7698   950    NA     30         0.167            1600
10  7521 WARD   SALESMAN   7698  1250   500     30         0.5              1600

[참고]

R Programming

%%R

emp %>%
  dplyr::arrange(deptno, sal)  %>%
  dplyr::group_by(deptno) %>%
  dplyr::mutate( sal_cume_dist = cume_dist(sal),
                 sal_lag       = lag(dplyr::cume_dist(sal)) , n=1, default = 0, order_by = sal) %>%
  dplyr::filter(0.15 >= ifelse(is.na(sal_lag), 0 , sal_lag)  & 0.15 < sal_cume_dist) %>%
  dplyr::select(-c("ename","job","hiredate"))

Results

# A tibble: 3 x 10
# Groups:   deptno [3]
  empno   mgr   sal  comm deptno sal_cume_dist sal_lag     n default order_by
  <dbl> <dbl> <dbl> <dbl>  <dbl>         <dbl>   <dbl> <dbl>   <dbl>    <dbl>
1  7934  7782  1300    NA     10         0.333      NA     1       0     1300
2  7369  7902   800    NA     20         0.2        NA     1       0      800
3  7900  7698   950    NA     30         0.167      NA     1       0      950

5. R sqldf Package

함수 구현 : 전체에서 35% 값

R Programming

%%R

sqldf(" select min(sal) Per_35p
        from ( select sal 
               from emp
               order by sal
               limit floor(.35 * (select count(*) from emp))
              ) ")

Results

  Per_35p
1     800

그룹 10에서 35% 값

R Programming

%%R

sqldf(" select min(sal) Per_35p
        from ( select sal 
               from   emp
               where  deptno=10
               order by sal
               limit ceil( .35 * (select count(*) from emp where deptno=10) ) )
        ")

Results

  Per_35p
1    1300

first_value() 와 percent_rank() 함수

R Programming

%%R

sqldf(" SELECT DISTINCT
               deptno,
               first_value(sal) OVER ( ORDER BY CASE WHEN p1 <= 0.15 THEN p1 END NULLS LAST ) x1,
               first_value(sal) OVER ( PARTITION BY deptno 
                                       ORDER BY CASE WHEN p2 <= 0.15 THEN p2 END NULLS LAST) x2
        FROM ( SELECT deptno,
                      sal,
                      percent_rank() OVER (ORDER BY sal) p1,
                      percent_rank() OVER (PARTITION BY deptno ORDER BY sal) p2
               FROM emp
             ) t
       ORDER BY deptno")

Results

  deptno  x1   x2
1     10 800 1300
2     20 800  800
3     30 800  950

R Programming

%%R

sqldf(" select deptno,
               sal,CUME_DIST,
               min(case when CUME_DIST >= 0.15 then x.sal end) over (partition by x.deptno) as percent_15,
               min(case when CUME_DIST >= 0.25 then x.sal end) over (partition by x.deptno) as percent_25,
               min(case when CUME_DIST >= 0.50 then x.sal end) over (partition by x.deptno) as percent_50,
               min(case when CUME_DIST >= 0.75 then x.sal end) over (partition by x.deptno) as percent_75
from (select x.*,
             row_number() over (partition by x.deptno
                                order by x.sal
                               ) as seqnum,
             CUME_DIST() OVER (PARTITION BY x.deptno ORDER BY sal) CUME_DIST,
             count(*) over (partition by x.deptno) as cnt
      from emp x
     ) x   ")

Results

   deptno  sal CUME_DIST percent_15 percent_25 percent_50 percent_75
1      10 1300 0.3333333       1300       1300       2450       5000
2      10 2450 0.6666667       1300       1300       2450       5000
3      10 5000 1.0000000       1300       1300       2450       5000
4      20  800 0.2000000        800       1100       2975       3000
5      20 1100 0.4000000        800       1100       2975       3000
6      20 2975 0.6000000        800       1100       2975       3000
7      20 3000 1.0000000        800       1100       2975       3000
8      20 3000 1.0000000        800       1100       2975       3000
9      30  950 0.1666667        950       1250       1250       1600
10     30 1250 0.5000000        950       1250       1250       1600
11     30 1250 0.5000000        950       1250       1250       1600
12     30 1500 0.6666667        950       1250       1250       1600
13     30 1600 0.8333333        950       1250       1250       1600
14     30 2850 1.0000000        950       1250       1250       1600

6. Python pandasql Package

Python Programming

ps.sqldf(" select deptno,             \
               sal,cnt,CUME_DIST,     \
               min(case when CUME_DIST >= 0.15 then x.sal end) over (partition by x.deptno) as percent_15,     \
               min(case when CUME_DIST >= 0.25 then x.sal end) over (partition by x.deptno) as percent_25,     \
               min(case when CUME_DIST >= 0.50 then x.sal end) over (partition by x.deptno) as percent_50,     \
               min(case when CUME_DIST >= 0.75 then x.sal end) over (partition by x.deptno) as percent_75      \
from (select x.*,                                         \
             row_number() over (partition by x.deptno     \
                                order by x.sal            \
                               ) as seqnum,               \
             CUME_DIST() OVER (PARTITION BY x.deptno ORDER BY sal) CUME_DIST,     \
             count(*) over (partition by x.deptno) as cnt                         \
      from emp x                                                                  \
     ) x   ").head()

Results

	deptno	sal	cnt	CUME_DIST	percent_15	percent_25	percent_50	percent_75
0	10	1300	3	0.333333	1300		2450		2450		5000
1	10	2450	3	0.666667	1300		2450		2450		5000
2	10	5000	3	1.000000	1300		2450		2450		5000
  3	20	800	5	0.200000	800		1100		2975		3000
4	20	1100	5	0.400000	800		1100		2975		3000

7. R data.table Package

R Programming

%%R

DT          <- data.table(emp)
dept_DT     <- data.table(dept)

DT[ order(sal), PERCENTILE_DISC := sapply(.SD , function (x) { ifelse( dplyr::cume_dist(x) >= 0.75, as.list(x) , NULL )  }  ),    ## as.list에서 최초값을 가져오는 로직 이해 아직 안됨.
    by = deptno ,.SDcols=c("sal")][order(deptno,sal),][1:10, ]

Results

    empno  ename       job  mgr   hiredate  sal comm deptno PERCENTILE_DISC
 1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10            5000
 2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10            5000
 3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10            5000
 4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20            3000
 5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20            3000
 6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20            3000
 7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20            3000
 8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20            3000
 9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30            1600
10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30            1600

R Programming

%%R

DT          <- data.table(emp)
dept_DT     <- data.table(dept)

DT[order(sal), `:=` ( PER_DISC_15 = min(.SD[dplyr::cume_dist(.SD) >= 0.15]),
                      PER_DISC_25 = min(.SD[dplyr::cume_dist(.SD) >= 0.25]),
                      PER_DISC_50 = min(.SD[dplyr::cume_dist(.SD) >= 0.50]),
                      PER_DISC_75 = min(.SD[dplyr::cume_dist(.SD) >= 0.75])  ), by=deptno,.SDcols=c("sal")][order(deptno,sal),][, c("ename","job","mgr","hiredate") := NULL]

Results

    empno  sal comm deptno PER_DISC_15 PER_DISC_25 PER_DISC_50 PER_DISC_75
 1:  7934 1300   NA     10        1300        1300        2450        5000
 2:  7782 2450   NA     10        1300        1300        2450        5000
 3:  7839 5000   NA     10        1300        1300        2450        5000
 4:  7369  800   NA     20         800        1100        2975        3000
 5:  7876 1100   NA     20         800        1100        2975        3000
 6:  7566 2975   NA     20         800        1100        2975        3000
 7:  7788 3000   NA     20         800        1100        2975        3000
 8:  7902 3000   NA     20         800        1100        2975        3000
 9:  7900  950   NA     30         950        1250        1250        1600
10:  7521 1250  500     30         950        1250        1250        1600
11:  7654 1250 1400     30         950        1250        1250        1600
12:  7844 1500    0     30         950        1250        1250        1600
13:  7499 1600  300     30         950        1250        1250        1600
14:  7698 2850   NA     30         950        1250        1250        1600

**참고용 : * * duplicated(.SD) 기능도 있음**

R Programming

%%R

DT          <- data.table(emp)
dept_DT     <- data.table(dept)

DT[order(sal),  `:=` (  test      = duplicated(.SD),
                        test1     = dplyr::cume_dist(duplicated(.SD)),
                        var_list  = ifelse( dplyr::cume_dist(duplicated(.SD)) >= 0.75,.SD, NA_real_) ,
                        cume_dist = lapply( ifelse( dplyr::cume_dist(duplicated(.SD)) >= 0.75,.SD, NA_real_) ,max ) # 리스트에서 조건을 만족하는 최소값
                      )   ,
                 by=deptno, .SDcols = c("sal")][order(deptno,sal),][, c("empno","ename","comm","job","mgr","hiredate") := NULL]

Results

     sal deptno  test     test1                      var_list cume_dist
 1: 1300     10 FALSE 1.0000000                1300,2450,5000      5000
 2: 2450     10 FALSE 1.0000000                1300,2450,5000      5000
 3: 5000     10 FALSE 1.0000000                1300,2450,5000      5000
 4:  800     20 FALSE 0.8000000       800,1100,2975,3000,3000        NA
 5: 1100     20 FALSE 0.8000000       800,1100,2975,3000,3000        NA
 6: 2975     20 FALSE 0.8000000       800,1100,2975,3000,3000        NA
 7: 3000     20 FALSE 0.8000000       800,1100,2975,3000,3000        NA
 8: 3000     20  TRUE 1.0000000       800,1100,2975,3000,3000      3000
 9:  950     30 FALSE 0.8333333  950,1250,1250,1500,1600,2850        NA
10: 1250     30 FALSE 0.8333333  950,1250,1250,1500,1600,2850        NA
11: 1250     30  TRUE 1.0000000  950,1250,1250,1500,1600,2850      2850
12: 1500     30 FALSE 0.8333333  950,1250,1250,1500,1600,2850        NA
13: 1600     30 FALSE 0.8333333  950,1250,1250,1500,1600,2850        NA
14: 2850     30 FALSE 0.8333333  950,1250,1250,1500,1600,2850        NA

8. Python Duckdb의 SQL

Python Programming

%%sql
SELECT DEPTNO,
       PERCENTILE_DISC(0.15) WITHIN GROUP (ORDER BY SAL) AS PERCENTILE_DISC
FROM   emp
GROUP  BY DEPTNO

Python Programming

duckdb.sql(" SELECT DEPTNO,                                                                   \
                    PERCENTILE_DISC(0.15) WITHIN GROUP (ORDER BY SAL) AS PERCENTILE_DISC      \
             FROM   emp                                                                       \
             GROUP  BY DEPTNO ").df()

Results

   deptno  PERCENTILE_DISC
0      20              800
1      30              950
2      10             1300

2) 집계 함수

테이블 조인

PARTITION BY 은 현재 지원 안 함.

: PERCENTILE_DISC(0.15) WITHIN GROUP (ORDER BY SAL) OVER (PARTITION BY DEPTNO) AS PERCENTILE_DISC

Python Programming

%%sql
select a.*,
       PERCENT_DISC
from   emp a
       left join ( SELECT DEPTNO,
                          percentile_disc(0.75) WITHIN GROUP (ORDER BY SAL) AS PERCENT_DISC
                   FROM   emp
                   group by deptno) b
            on a.deptno = b.deptno
order  by a.deptno
LIMIT  6

Python Programming

duckdb.sql(" select a.deptno,a.empno,a.ename,job,sal,                                                                                      \
                    PERCENT_DISC                                                                              \
             from   emp a                                                                                     \
                    left join ( SELECT DEPTNO,                                                                \
                                       percentile_disc(0.75) WITHIN GROUP (ORDER BY SAL) AS PERCENT_DISC      \
                                FROM   emp                                                                    \
                                group by deptno) b                                                            \
                         on a.deptno = b.deptno                                                               \
             order  by a.deptno                                                                               \
             LIMIT  6 ").df()

Results

   deptno  empno   ename        job   sal  PERCENT_DISC
0      10   7782   CLARK    MANAGER  2450          5000
1      10   7839    KING  PRESIDENT  5000          5000
2      10   7934  MILLER      CLERK  1300          5000
3      20   7369   SMITH      CLERK   800          3000
4      20   7566   JONES    MANAGER  2975          3000
5      20   7788   SCOTT    ANALYST  3000          3000

서브쿼리 방

Python Programming

%%sql
select a.*,
       (  SELECT percentile_disc(0.75) WITHIN GROUP (ORDER BY SAL) AS PERCENT_DISC
          FROM   emp b
          where  a.deptno = b.deptno) PERCENT_DISC
from   emp a
order  by a.deptno
LIMIT  6

Python Programming

duckdb.sql(" select a.deptno,a.empno,a.ename,job,sal,                                               \
                    (  SELECT percentile_disc(0.75) WITHIN GROUP (ORDER BY SAL) AS PERCENT_DISC     \
                       FROM   emp b                                                                 \
                       where  a.deptno = b.deptno) PERCENT_DISC                                     \
             from   emp a                                                                           \
             order  by a.deptno                                                                     \
             LIMIT  6 ").df()

Results

   deptno  empno   ename        job   sal  PERCENT_DISC
0      10   7782   CLARK    MANAGER  2450          5000
1      10   7839    KING  PRESIDENT  5000          5000
2      10   7934  MILLER      CLERK  1300          5000
3      20   7369   SMITH      CLERK   800          3000
4      20   7566   JONES    MANAGER  2975          3000
5      20   7788   SCOTT    ANALYST  3000          3000

Python Programming

%%sql
  select deptno,
         sal,cnt,CUME_DIST,
         min(case when CUME_DIST >= 0.15 then x.sal end) over (partition by x.deptno) as percent_15,
         min(case when CUME_DIST >= 0.25 then x.sal end) over (partition by x.deptno) as percent_25,
         min(case when CUME_DIST >= 0.50 then x.sal end) over (partition by x.deptno) as percent_50,
         min(case when CUME_DIST >= 0.75 then x.sal end) over (partition by x.deptno) as percent_75
  from ( select x.*,
                row_number() over (partition by x.deptno
                                   order by x.sal
                                  ) as seqnum,
                CUME_DIST() OVER (PARTITION BY x.deptno ORDER BY sal) CUME_DIST,
                count(*) over (partition by x.deptno) as cnt
         from emp x
      ) x
  order by x.deptno
  LIMIT 6

Python Programming

duckdb.sql(" select deptno,                                                                                               \
                    sal,cnt,CUME_DIST,                                                                                    \
                    min(case when CUME_DIST >= 0.15 then x.sal end) over (partition by x.deptno) as perc_15,              \
                    min(case when CUME_DIST >= 0.25 then x.sal end) over (partition by x.deptno) as perc_25,              \
                    min(case when CUME_DIST >= 0.50 then x.sal end) over (partition by x.deptno) as perc_50,              \
                    min(case when CUME_DIST >= 0.75 then x.sal end) over (partition by x.deptno) as perc_75               \
             from ( select x.*,                                                                                           \
                           row_number() over (partition by x.deptno                                                       \
                                              order by x.sal                                                               \
                                             ) as seqnum,                                                                 \
                           CUME_DIST() OVER (PARTITION BY x.deptno ORDER BY sal) CUME_DIST,                               \
                           count(*) over (partition by x.deptno) as cnt                                                   \
                    from emp x                                                                                            \
                  ) x                                                                                                      \
             order by x.deptno                                                                                            \
             LIMIT 6 ").df()

Results

   deptno   sal  cnt  CUME_DIST  perc_15  perc_25  perc_50  perc_75
0      10  1300    3   0.333333     1300     1300     2450     5000
1      10  2450    3   0.666667     1300     1300     2450     5000
2      10  5000    3   1.000000     1300     1300     2450     5000
3      20   800    5   0.200000      800     1100     2975     3000
4      20  1100    5   0.400000      800     1100     2975     3000
5      20  2975    5   0.600000      800     1100     2975     3000

--------------------------------------------

[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크

저작자표시 비영리 변경금지

'통계프로그램 비교 시리즈 > 오라클함수 비교(R & Python)' 카테고리의 다른 글

RANK 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2021.12.13
POWER 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2021.12.13
PERCENT_RANK 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2021.12.10
NVL2 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2021.12.09
NVL 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2021.12.09

PERCENTILE_DISC 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

[ PERCENTILE_DISC Oracle Function ]

1. Oracle(오라클)

1) 집계함수

2) 분석함수

2. Python Pandas(파이썬)

1) 집계함수

2) 분석함수

3. R Programming (R Package)

4. R Dplyr Package

[참고]

5. R sqldf Package

함수 구현 : 전체에서 35% 값

그룹 10에서 35% 값

first_value() 와 percent_rank() 함수

6. Python pandasql Package

7. R data.table Package

**참고용 : * * duplicated(.SD) 기능도 있음**

8. Python Duckdb의 SQL

2) 집계 함수

'통계프로그램 비교 시리즈 > 오라클함수 비교(R & Python)' 카테고리의 다른 글

댓글

티스토리툴바

PERCENTILE_DISC 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

[ PERCENTILE_DISC Oracle Function ]

1. Oracle(오라클)

1) 집계함수

2) 분석함수

2. Python Pandas(파이썬)

1) 집계함수

2) 분석함수

3. R Programming (R Package)

4. R Dplyr Package

[참고]

5. R sqldf Package

함수 구현 : 전체에서 35% 값

그룹 10에서 35% 값

first_value() 와 percent_rank() 함수

6. Python pandasql Package

7. R data.table Package

참고용 : * * duplicated(.SD) 기능도 있음

8. Python Duckdb의 SQL

2) 집계 함수

'통계프로그램 비교 시리즈 > 오라클함수 비교(R & Python)' 카테고리의 다른 글

관련글

댓글

티스토리툴바

**참고용 : * * duplicated(.SD) 기능도 있음**