본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

STDDEV_POP 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 20.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ STDDEV_POP Oracle Function ]

     


    STDDEV_POP 함수는 지정한 인수 expr의 모표준편차를 반환한다. 집계 함수와 분석함수로 이용할 수 있다.

     

     

     

      
     


    1. Oracle(오라클)

     

    1) 집계 함수

     

    • STDDEV_POP() 함수

    급여의 모 표준편차를 반환한다.

     

    Oracle Programming
    SELECT STDDEV_POP(sal) STDDEVPOP_VAR
    FROM   emp

     

    Results
    STDDEVPOP_VAR
    --------------------------------------------
    1139.488618295281524185955567137220166443

     


    2) 분석함수

     

    • STDDEV_POP() 함수

    분석함수를 사용하여서 개별 부서별 급여의 모표준편차를 반환한다.

     

    Oracle Programming
    SELECT DEPTNO, EMPNO, 
           STDDEV_POP(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV_POP
    FROM   EMP;

     

    Results
    DEPTNO	EMPNO	STDDEV_POP
    --------------------------------------------------------
    10	7782	1546.142152441215721851972457687124239742
    10	7934	1546.142152441215721851972457687124239742
    10	7839	1546.142152441215721851972457687124239742
    20	7902	1004.738772019871839881384168371541966154
    20	7788	1004.738772019871839881384168371541966154
    20	7566	1004.738772019871839881384168371541966154
    20	7369	1004.738772019871839881384168371541966154
    20	7876	1004.738772019871839881384168371541966154
    30	7521	610.100173924104222773473633873610159005
    30	7654	610.100173924104222773473633873610159005
    30	7844	610.100173924104222773473633873610159005
    30	7900	610.100173924104222773473633873610159005
    30	7499	610.100173924104222773473633873610159005
    30	7698	610.100173924104222773473633873610159005

     


    2. Python Pandas(파이썬)

     

    1) 집계 함수

    • pstdev() 함수
    Python Programming
    from statistics import pstdev 
    
    pstdev(emp['sal'])

     

    Results
    1139.4886182952814

     


    2) 분석함수

    • pstdev() 함수
    Python Programming
    from statistics import pstdev
    
    withmooc = emp.copy()
    withmooc['sal_pstdev'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform(lambda x: pstdev(x))
    withmooc.sort_values('deptno').head(7)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_pstdev
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	1546.142152
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	1546.142152
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1546.142152
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1004.738772
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	1004.738772
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	1004.738772
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1004.738772
     

     


    3. R Programming (R Package)

     

    1) 집계 함수

    • multicon::popsd 함수
    R Programming
    %%R
    library(multicon)
    multicon::popsd(emp$sal)

     

    Results
    [1] 1139.489

     


    • 모표준편차 구현
    R Programming
    %%R
    
    sqrt((length(emp$sal)-1)/length(emp$sal)) * sd(emp$sal)

     

    Results
    [1] 1139.489

     


    2) 분석함수

    • popsd() 함수
    R Programming
    %%R
    withmooc <- emp
    withmooc['sal_pstdev'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { popsd(x) }))
    
    withmooc[order(withmooc$deptno),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_pstdev
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <dbl>
     1  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10      1546.
     2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10      1546.
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10      1546.
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20      1005.
     5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20      1005.
     6  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20      1005.
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20      1005.
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20      1005.
     9  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30       610.
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30       610.

     

     


    4. R Dplyr Package

     

    1) 집계 함수

     

    • multicon::popsd() 함수
    R Programming
    %%R
    
    library(multicon)
    
    emp %>%
      dplyr::summarize(sal_pstdev = multicon::popsd(sal) )

     

    Results
    # A tibble: 1 x 1
      sal_pstdev
           <dbl>
    1      1139.

     


    2) 분석함수

     

    • popsd() 함수
    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_pstdev = popsd(sal) ) %>%
      dplyr::arrange(deptno) %>%
      head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno sal_pstdev
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <dbl>
     1  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10      1546.
     2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10      1546.
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10      1546.
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20      1005.
     5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20      1005.
     6  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20      1005.
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20      1005.
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20      1005.
     9  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30       610.
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30       610.

     

     


    5. R sqldf Package

     

    1) 집계 함수

    • 모표준편차 구현
    R Programming
    %%R
    
    sqldf("select sqrt((count(sal)-1)/cast( count(sal) as real)) * stdev(sal) stddev_pop from emp ")

     

    Results
      stddev_pop
    1   1139.489

     


    • stddev_pop() 함수

    RH2 패키지에서 지원하는 stddev_pop() 함수를 사용한다.

    R Programming
    %%R
    library(RH2)
    
    sqldf(" SELECT stddev_pop(sal) stddev_pop FROM emp ")

     

    Results
      stddev_pop
    1   1139.489

     


    [참고] typeof / cat

    R Programming
    %%R
    
    sqldf("select typeof(13) int_num, typeof(14.0) real_num, 13/14 zero, 13/14.0 real_num_1, 13/cast(14 as real) real_num_2")

     

    Results
      int_num real_num zero real_num_1 real_num_2
    1 integer     real    0  0.9285714  0.9285714

     


    2) 분석함수

    • stddev_pop() 함수
    R Programming
    %%R
    
    sqldf(" select a.*,stddev_pop
            from   emp a
                   left join (SELECT deptno,stddev_pop(sal) stddev_pop FROM emp group by deptno) b
                        on    a.deptno = b.deptno
            order by a.deptno")

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno stddev_pop
    1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10  1546.1422
    2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10  1546.1422
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10  1546.1422
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20  1004.7388
    5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20  1004.7388
    6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20  1004.7388
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20  1004.7388
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20  1004.7388
    9   7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30   610.1002
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30   610.1002
    11  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30   610.1002
    12  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30   610.1002
    13  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30   610.1002
    14  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30   610.1002

     

     


    6. Python pandasql Package

    1) 집계 함수

    2) 분석함수

     

     

     


    7. R data.table Package

     

    1) 집계 함수

    • popsd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    library(multicon)
    
    DT[, list(sal_pstdev = popsd(sal) )]

     

    Results
       sal_pstdev
    1:   1139.489

     


    • popsd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    library(multicon)
    
    DT[, list(sal_pstdev = popsd(sal) ), by =deptno]

     

    Results
       deptno sal_pstdev
    1:     20  1004.7388
    2:     30   610.1002
    3:     10  1546.1422

     


    2) 분석함수

     

    • popsd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_pstdev := popsd(sal), by = deptno][order(deptno),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_pstdev
     1:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10  1546.1422
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10  1546.1422
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10  1546.1422
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20  1004.7388
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20  1004.7388
     6:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20  1004.7388
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20  1004.7388
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20  1004.7388
     9:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30   610.1002
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30   610.1002

     

     


    • popsd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, `:=` ( sal_pstdev = popsd(unlist(.SD)) ), 
         by = deptno,
         .SDcols = c("sal")][order(deptno),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_pstdev
     1:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10  1546.1422
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10  1546.1422
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10  1546.1422
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20  1004.7388
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20  1004.7388
     6:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20  1004.7388
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20  1004.7388
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20  1004.7388
     9:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30   610.1002
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30   610.1002

     


    8. Python Duckdb의 SQL

     

    1) 집계 함수

    • STDDEV_POP() 함수
    Python Programming
    %%sql
      SELECT STDDEV_POP(sal) as stdev_pop_func FROM emp

     

     

    Python Programming
    duckdb.sql(" SELECT STDDEV_POP(sal) as stdev_pop_func FROM emp ").df()

     

    Results
       stdev_pop_func
    0     1139.488618

     


    2) 분석 함수

    • STDDEV_POP() 함수
    Python Programming
    %%sql
      SELECT DEPTNO, EMPNO,
             STDDEV_POP(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV_POP
      FROM   emp
      order  by deptno
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT DEPTNO, EMPNO,                                               \
                        STDDEV_POP(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV_POP     \
                 FROM   emp                                                          \
                 order  by deptno                                                    \
                 LIMIT  6 ").df()

     

    Results
       deptno  empno   STDDEV_POP
    0      10   7782  1546.142152
    1      10   7839  1546.142152
    2      10   7934  1546.142152
    3      20   7369  1004.738772
    4      20   7566  1004.738772
    5      20   7788  1004.738772

     


    ( https://unsplash.com/photos/yG9pCqSOrAg )

     

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글