본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

ASCII Oracle Function [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table]

by 기서무나구물 2021. 11. 19.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ ASCII 함수]

     


    ASCII 함수는 주어진 char의 첫 문자의 아스키 값에 상응하는 10 진수 값을 반환한다.

     

     


    1. Oracle(오라클)

     

    직원 이름의 첫 번째 문자가 ASCII 코드값이 77인 직원의 정보를 반환한다.

    Oracle Programming
    SELECT ename
    FROM   emp
    WHERE  ASCII(SUBSTR(ename, 1, 1)) = 77
    ORDER 
       BY  ename;

     

    Results
    ENAME
    ---------
    MARTIN
    MILLER

     


    2. Python Pandas(파이썬)

     

    직원 이름의 첫 번째 문자가 ASCII 코드값이 77인 직원의 정보를 반환한다.

     

    • ord()
    Python Programming
    emp[emp['ename'].str.slice(0,1).apply(lambda x : ord(x)) == 77]

     

    Results
           empno      ename      job        mgr      hiredate     sal      comm      deptno
     4      7654      MARTIN     SALESMAN  7698.0   1981/09/28    1250     1400.0      30
    13      7934      MILLER     CLERK     7782.0   1982/01/23    1300        NaN      10

     

     


    3. R Programming (R Package)

     

    직원 이름의 첫 번째 문자가 ASCII 코드값이 77인 직원의 정보를 반환한다.

     

    • utf8ToInt() 함수
    R Programming
    %%R
    
    emp [ sapply(emp$ename, function(x) utf8ToInt(substr(x,1,1)) ) ==77 , ]

     

    Results
    # A tibble: 2 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
                      
    1  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    2  7934 MILLER CLERK     7782 1982-01-23  1300    NA     10

     

     


    4. R Dplyr Package

     

    직원 이름의 첫 번째 문자가 ASCII 코드값이 77인 직원의 정보를 반환한다.

     

    • utf8ToInt() 함수
    R Programming
    %%R
    
    emp %>%
      rowwise() %>%
      dplyr::filter( utf8ToInt(substr(ename,1,1)) ==77 )

     

    Results
    # A tibble: 2 x 8
    # Rowwise: 
      empno ename  job        mgr hiredate     sal  comm deptno
                      
    1  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    2  7934 MILLER CLERK     7782 1982-01-23  1300    NA     10

     

    • letter2num() 함수
    R Programming
    %%R
    
    letter2num <- function(x) {utf8ToInt(x)}
    
    emp %>%
      rowwise() %>%
      dplyr::filter( letter2num(stringr::str_sub(ename,1,1)) == 77 )

     

    Results
    # A tibble: 2 x 8
    # Rowwise: 
      empno ename  job        mgr hiredate     sal  comm deptno
                      
    1  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    2  7934 MILLER CLERK     7782 1982-01-23  1300    NA     10

     


    R Programming
    %%R
    
    emp %>% 
      dplyr::filter( purrr::pmap_dbl(list(stringr::str_sub(ename,1,1)), utf8ToInt)  == 77)

     

    Results
    # A tibble: 2 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
                      
    1  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    2  7934 MILLER CLERK     7782 1982-01-23  1300    NA     10

     


    5. R sqldf Package

     

    직원 이름의 첫 번째 문자가 ASCII 코드값이 77인 직원의 정보를 반환한다.

     

    • unicode() 함수
    R Programming
    %%R
    
    sqldf(" SELECT ename, unicode(SUBSTR(ename, 1, 1)) unicode, char(77)
            FROM emp
            WHERE unicode(SUBSTR(ename, 1, 1)) = 77
            ORDER BY ename; ")

     

    Results
       ename unicode char(77)
    1 MARTIN      77        M
    2 MILLER      77        M

     


    6. Python pandasql Package

     

    직원 이름의 첫 번째 문자가 ASCII 코드값이 77인 직원의 정보를 반환한다.

     

    • unicode() 함수
    Python Programming
    ps.sqldf(" SELECT ename, unicode(SUBSTR(ename, 1, 1)) unicode, char(77)   \
            FROM emp   \
            WHERE unicode(SUBSTR(ename, 1, 1)) = 77   \
            ORDER BY ename ")

     

    Results
    	ename	unicode		char(77)
    0	MARTIN	77		   M
    1	MILLER	77		   M

     


    7. R data.table Package

     

    • utf8ToInt() 함수
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, utf_int:= utf8ToInt (stringr::str_sub (ename,1,1) ), by = row.names(DT)][utf_int==77,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno utf_int
    1:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      77
    2:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10      77

     


    • letter2num() 함수
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    letter2num <- function(x) {utf8ToInt(x)}
    
    DT[, utf_int:= letter2num (stringr::str_sub (ename,1,1) ), by = row.names(DT)][utf_int==77,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno utf_int
    1:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      77
    2:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10      77

     


    • uft8ToInt() 함수와 purrr::pmap_dbl
    R Programming
    %%R
    
    DT[ purrr::pmap_dbl(list(stringr::str_sub(ename,1,1)), utf8ToInt)  == 77, ]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno utf_int
    1:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      77
    2:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10      77

     


    • uft8ToInt() 함수와 Map
    R Programming
    %%R
    
    DT[Map(utf8ToInt, stringr::str_sub(ename,1,1)) == 77,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno utf_int
    1:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      77
    2:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10      77

     


    • uft8ToInt() 함수와 sapply
    R Programming
    %%R
    
    DT[sapply(stringr::str_sub(ename,1,1), function(x) utf8ToInt(x))  == 77, ]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno utf_int
    1:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      77
    2:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10      77

     


    • uft8ToInt() 함수와 .SD (Subset of Data)
    R Programming
    %%R
    
    DT[, x1 := utf8ToInt(stringr::str_sub(.SD,1,1) ) , 1:nrow(DT) , .SDcols = c('ename')][x1==77,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno utf_int x1
    1:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      77 77
    2:  7934 MILLER    CLERK 7782 1982-01-23 1300   NA     10      77 77

     


    [참고] 누적합 참고

    R Programming
    %%R
    
    DT[, x := as.list(cumsum(unlist(.SD))), 1:nrow(DT) , .SDcols = c('sal')]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno utf_int x1    x
     1:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      83 83  800
     2:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30      65 65 1600
     3:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30      87 87 1250
     4:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      74 74 2975
     5:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30      77 77 1250
     6:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      66 66 2850
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      67 67 2450
     8:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20      83 83 3000
     9:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10      75 75 5000
    10:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30      84 84 1500
    11:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      65 65 1100
    12:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      74 74  950
    13:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20      70 70 3000
    14:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      77 77 1300

     


    8. Python Duckdb의 SQL

     

    Python Programming
    %%sql
      SELECT *
      FROM   emp
      WHERE  ASCII(SUBSTR(ename, 1, 1)) = 77
      ORDER
         BY  ename

     

    Python Programming
    duckdb.sql(" SELECT *                                 \
                 FROM   emp                               \
                 WHERE  ASCII(SUBSTR(ename, 1, 1)) = 77   \
                 ORDER                                    \
                    BY  ename  ").df()

     

    Results
       empno   ename       job     mgr    hiredate   sal    comm  deptno
    0   7654  MARTIN  SALESMAN  7698.0  1981/09/28  1250  1400.0      30
    1   7934  MILLER     CLERK  7782.0  1982/01/23  1300     NaN      10

     


     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록? 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크

    이미지 출처 :&nbsp;https://unsplash.com/photos/S5XsDlChGAQ

    반응형

    댓글