본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

STDDEV 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 20.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ STDDEV Oracle Function ]

     


    STDDEV함수는 지정한 인수 expr의 표본 표준편차를 반환한다. 집계 함수와 분석함수로 이용할 수 있다. STDDEV_SAMP와의 차이점은 입력 데이터가 단지 1 행으로 구성된 경우 STDDEV는 Zero를 반환하고, STDDEV_SAMP는 Null을 반환한다.

     

     


    1. Oracle(오라클)

     

    1) 집계 함수

     

     

    급여의 표본 표준편차를 반환한다.

     

    Oracle Programming
    SELECT STDDEV(sal) as STDDEV_VAR
    FROM   EMP

     

    Results
    STDDEV_VAR
    -------------------------------------------
    1182.503223516271699458653359613061928508

     


    2) 분석함수

     

    개별 부서별 급여의 표본 표준편차를 반환한다.

     

    Oracle Programming
    SELECT DEPTNO, EMPNO, 
           STDDEV(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV
    FROM   EMP;

     

    Results
    DEPTNO	EMPNO	STDDEV
    --------------------------------------------------------
    10	7782	1893.629671644731419588549193663502608485
    10	7934	1893.629671644731419588549193663502608485
    10	7839	1893.629671644731419588549193663502608485
    20	7902	1123.332096933048557320586159803793607308
    20	7788	1123.332096933048557320586159803793607308
    20	7566	1123.332096933048557320586159803793607308
    20	7369	1123.332096933048557320586159803793607308
    20	7876	1123.332096933048557320586159803793607308
    30	7521	668.331255192114045025291283415053677345
    30	7654	668.331255192114045025291283415053677345
    30	7844	668.331255192114045025291283415053677345
    30	7900	668.331255192114045025291283415053677345
    30	7499	668.331255192114045025291283415053677345
    30	7698	668.331255192114045025291283415053677345

     


    2. Python Pandas(파이썬)

     

    1) 집계 함수

     

    • stdev() 함수

    급여의 표본 표준편차를 반환한다.

     

    Python Programming
    from statistics import stdev 
    
    stdev(emp['sal'])

     

    Results
    1182.5032235162716

     


    • std() 함수

    개별 부서별 급여의 표본 표준편차를 반환한다.

     

    Python Programming
    withmooc = emp.copy()
    
    sal_STDDEV = withmooc.sort_values('sal',ascending = False).groupby('deptno')['sal'].std()
    sal_STDDEV

     

    Results
    deptno
    10    1893.629672
    20    1123.332097
    30     668.331255
    Name: sal, dtype: float64

     


    2) 분석함수

     

    • transform('std')

    분석함수를 사용하여서 개별 부서별 급여의 표본 표준편차를 반환한다.

     

    Python Programming
    withmooc = emp.copy()
    withmooc['sal_STDDEV'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform('std')
    withmooc.sort_values('deptno').head(7)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_STDDEV
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	1893.629672
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	1893.629672
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1893.629672
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1123.332097
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	1123.332097
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	1123.332097
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1123.332097

     

     


    3. R Programming (R Package)

     

    1) 집계 함수

     

    • sd() 함수
    %%R
    sd(emp$sal)

     

    Results
    [1] 1182.503

     


    2) 분석함수

     

    • sd() 함수
    %%R
    withmooc <- emp
    withmooc['sal_STDDEV'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { sd(x) }))
    
    withmooc[order(withmooc$deptno),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_STDDEV
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <dbl>
     1  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10      1894.
     2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10      1894.
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10      1894.
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20      1123.
     5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20      1123.
     6  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20      1123.
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20      1123.
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20      1123.
     9  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30       668.
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30       668.

     

     


    4. R Dplyr Package

     

    1) 집계 함수

     

    • sd() 함수
    R Programming
    %%R
    
    emp %>%
      dplyr::summarize(sal_STDDEV = sd(sal) )

     

    Results
    # A tibble: 1 x 1
      sal_STDDEV
           <dbl>
    1      1183.

     


    2) 분석함수

     

    • sd() 함수
    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_STDDEV = sd(sal) ) %>%
      dplyr::arrange(deptno) %>%
      head()

     

    Results
    # A tibble: 6 x 9
    # Groups:   deptno [2]
      empno ename  job         mgr hiredate     sal  comm deptno sal_STDDEV
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <dbl>
    1  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10      1894.
    2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10      1894.
    3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10      1894.
    4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20      1123.
    5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20      1123.
    6  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20      1123.

     

     


    5. R sqldf Package

     

     

    1) 집계 함수

    • stdev() 함수
    R Programming
    %%R
    
    sqldf(" SELECT stdev(sal) sal_STDDEV FROM emp ")

     

    Results
      sal_STDDEV
    1   1182.503

     


    • stddev_samp() 함수

     

    RH2 패키지에서 지원하는 stddev_samp() 함수를 사용한다.

     

    R Programming
    %%R
    library(RH2)
    
    sqldf(" SELECT stddev_samp(sal) sal_STDDEV FROM emp ")

     

    Results
      sal_STDDEV
    1   1182.503

     

     


    2) 분석함수

     

    • stdev() 함수

    현재 stdev에 대한 분석 함수를 지원하지 않아서 그룹별 급여의 표준편차를 계산 후 데이터를 Join 하였다.

     

    R Programming
    %%R
    
    sqldf(" select a.*,b.sal_STDDEV
            from   emp a
                   left join (SELECT deptno,stdev(sal) sal_STDDEV FROM emp group by deptno) b
                        on    a.deptno = b.deptno
            order by a.deptno")[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno sal_STDDEV
    1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10  1893.6297
    2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10  1893.6297
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10  1893.6297
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20  1123.3321
    5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20  1123.3321
    6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20  1123.3321
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20  1123.3321
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20  1123.3321
    9   7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30   668.3313
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30   668.3313

     

     


    6. Python pandasql Package

    • 현재 지원하지 않음.

     

     


    7. R data.table Package

     

    1) 집계 함수

     

    • sd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, list(sal_STDDEV = sd(sal) ), by =deptno]

     

    Results
       deptno sal_STDDEV
    1:     20  1123.3321
    2:     30   668.3313
    3:     10  1893.6297

     


    2) 분석함수

    • sd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, b.sal_STDDEV := sd(sal), by = deptno][order(deptno),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno b.sal_STDDEV
     1:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    1893.6297
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    1893.6297
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    1893.6297
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20    1123.3321
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    1123.3321
     6:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    1123.3321
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1123.3321
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20    1123.3321
     9:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30     668.3313
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30     668.3313

     


    • sd() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, `:=` ( b.sal_STDDEV  = sd(unlist(.SD)) ), 
         by = deptno,
         .SDcols = c("sal")][order(deptno),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno b.sal_STDDEV
     1:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    1893.6297
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    1893.6297
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    1893.6297
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20    1123.3321
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    1123.3321
     6:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    1123.3321
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1123.3321
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20    1123.3321
     9:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30     668.3313
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30     668.3313

     


    8. Python Duckdb의 SQL

     

    1) 집계 함수

     

    Python Programming
    %%sql
      SELECT STDDEV(sal) as stdev_func FROM emp

     

    Python Programming
    duckdb.sql(" SELECT STDDEV(sal) as stdev_func FROM emp ").df()

     

    Results
        stdev_func
    0  1182.503224

     


    2) 분석 함수

     

    Python Programming
    %%sql
      SELECT DEPTNO, EMPNO,
             STDDEV(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV
      FROM   emp
      order  by deptno
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT DEPTNO, EMPNO,                                      \
                        STDDEV(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV    \
                 FROM   emp                                                 \
                 order  by deptno                                           \
                 LIMIT  6 ").df()

     

    Results
       deptno  empno       STDDEV
    0      10   7782  1893.629672
    1      10   7839  1893.629672
    2      10   7934  1893.629672
    3      20   7369  1123.332097
    4      20   7566  1123.332097
    5      20   7788  1123.332097

     


    Rolling dice on blue. (https://unsplash.com/photos/uKlneQRwaxY)

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글