[4학년 - 빅데이터기술] R프로그래밍

※ 강의 자료만 보기, 2,3,4,6,7 장, 연습문제 위주, (iris, boston, ozon, state.x77)얘네 나옴

1. state.x77

1) data("state")

2) 문제

- 3번째 행과 8번째 열이 교차하는 원소 추출하기

state.x77[3, 8]

- 5, 22, 44행과 1, 4, 7열에 해당하는 값 추출하기

state.x77[c(5,22,44), c(1,4,7)]

- 5행부터 49행 까지를 제외한 나머지 행과 3열부터 5열까지의 원소 추출하기

state.x77[-c(5:49),c(3:5)]

- state.x77의 두 번째 열인 income이 4000보다 큰 것만 추출하기

state.x77[state.x77[,2] > 4000,]

2. iris - data.frame

1) data(iris)

2) 문제

- Sepal.Width > 3.5

iris[iris$Sepal.Width > 3.5, ]

- Species == 'versicolor'데이터 찾기

iris[iris$Species == "versicolor", ]

- iris 의 모든 컬럼명 조회

names(iris)

colnames(iris)

- Width가 들어간 컬럼명만 조회(대소문자 구별 없이) **

names(iris)[grep("Width", names(iris),ignore.case=T)]

- "Petal"이 들어간 컬럼명의 데이터의 상단의 일부 값만 조회 **

head(iris[, grep("Petal", names(iris))])

- iris 데이터에서 Species가 'versicolor'인 행들의 인덱스 출력**

which(iris$Species=='versicolor')

- iris 데이터에서 Species가 'versicolor'인 행들의 subset 생성**

subset(iris, iris$Species == 'versicolor')

iris[which(iris$Species=='versicolor'),]

3. ozone

1) install.packages("mlbench")

library(mlbench)

data(Ozone)

plot(Ozone$V8, Ozone$V9)

2) 문제

- 축이름 (xlab, ylab), 제목 main

plot(Ozone$V8, Ozone$V9, xlab = "Sandburg Temp", ylab = "El Monte Temp", main="Ozone")

- 점의 종류(pch), 점의 크기(cex)

plot(Ozone$V8, Ozone$V9, xlab = "Sandburg Temp", ylab = "El Monte Temp", main="Ozone", pch="+")

**구글에서 r pch symbols 검색하기

plot(Ozone$V8, Ozone$V9, xlab = "Sandburg Temp", ylab = "El Monte Temp", main="Ozone", cex=.5)

- 점의 색깔

plot(Ozone$V8, Ozone$V9, xlab = "Sandburg Temp", ylab = "El Monte Temp", main="Ozone", col="tan2")

- type 바꾸기

data(cars)

str(cars)

head(cars)

---------------> data얻어오기

plot(cars) ##산점도 그래프

plot(cars, type="l") ##선그래프

plot(cars, type="o", cex=.5) ##선과 점그래프

- 선유형(lty)

plot(cars, type="l", lty="dashed")

- 그래프의 배열(mfrow) : 한 창에 여러개의 그래프를 나열

opar <- par(mfrow=c(2,1))

plot(Ozone$V8, Ozone$V9, xlab = "Sandburg Temp", ylab = "El Monte Temp", main = "Ozone")

plot(Ozone$V8, Ozone$V9, xlab = "Sandburg Temp", ylab = "El Monte Temp", main = "Ozone2")

par(opar)

4. boston

install.packages("corrgram")

library("corrgram")

library(MASS)

data("Boston")

- bonton.sub <- Boston[c('lstat','indus','nox','rm','medv')]

- corrgram(cor(boston.sub), upper.panel = panel.conf)

- 2by2 산점도

par(mfrow=c(2,2))

sapply(names(boston.sub),function(x){

plot(boston.sub[,x],Boston[,"medv"],xlab=x,ylab="medv")

})

- pairs(boston.sub)

- pearson 상관관계

cor.test(c(1,2,3,4,5), c(1,0,3,4,5), method = "pearson")

5. 행렬이 교차하는 원소 추출(문제 그래도 나옴 책, ppt, 연습문제)

- matrix에서 3번째 행과 8번째 열이 교차하는 데이터를 추출해라

- matrix에서 부분만 추출하는것

6. Sepal.Length 이용

- Sepal의 길이에 따라 다르게 가져오기

7. 값이 5보다 크고 작은거 뽑아내기

1) 문제 Sepal.Length > 5이면 "greater", 아니면 "less"

- if-else, for-loop이용(1)

for(i in 1:nrow(iris)){

if(iris$Sepal.Length[i] > 5)

output[i] <- "greater"

else

output[i] <- "less"

}

- if-else, for-loop이용(2)

for(i in 1:nrow(iris))

output[i] <- if(iris$Sepal.Length[i] > 5) "greater" else "less"

- ifelse()

output <- ifelse(iris$Sepal.Length >5, "greater", "less")

- apply() **

output3<-apply(iris,1,function(x){

if(x['Sepal.Length']>5)

'greater'

else

'less'

})

8. apply 적용하는거**

- apply함수를 어떻게 쓰고 적용하는지

- tapply, sapply, 중간값 구하는 방법

1) 문제

- m행렬을 만들고 apply()이용하여 각행의 0보다 작은 원소들 개수

m <- matrix(data=cbind(rnorm(30,0), rnorm(30,2), rnorm(30,5)),nrow = 30, ncol = 3)

apply(m, 2, function(x){length(x[x<0])})

- m행렬을 만들고 apply()이용하여 각행의 0보다 큰 원소들 평균

apply(m, 2, function(x){mean(x[x>0])})

- library(lattice) barley 이용

- lapply() 이용해 barley데이터 각 열의 유일한 값들 구하기 unique() 함수 사용

library(lattice)

barley

lapply(barley, function(x) unique(x))

- sapply() 함수를 이용해 barley데이터 각 역의 유일한 값들의 개수 구하기

sapply(barley, function(x) unique(x))

sapply(barley, function(x) length(unique(x)))

- tapply()함수를 이용해 iris데이터에서 Species별로 Petal.Length 평균 구하기

tapply(iris$Petal.Length, iris$Species, mean)

9. 막대그래프, 상자그림 그려서 중앙값 비교, pearson 상관계수 구하기

- 막대그래프 그리기, 상자 그림 그려서 중간값 비교하는것

- person 상관관계수 cor.test이거 사용하기

- boxplot(iris$Sepal.Width)

- 두 박스 그래프의 중간값이 다른지 확인하는 그래프

sv <- subset(iris, Species=="setosa" | Species=="versicolor")

sv$Species <- factor(sv$Species)

boxplot(Sepal.Width ~ Species, data=sv, notch=T)

10. col메트릭스, 그래프 조건 부여(레이블, xy값 지정 등등)

- 그래프 가져올때 조건, 점의 칼라 바꾸고 두개의 그래프를 하나로 합치고 -> Ozone..!

- Iris 데이터의 Petal.Length, Petal.Width를 산점도로 그려보세요

위 산점도를 Species 별로 색상을 “red”, “green”, “blue”로 다르게 그려 보세요

plot(iris$Petal.Length, iris$Petal.Width, pch=21, bg=c("red","green","blue") [unclass(iris$Species)])

- 막대그래프

barplot(tapply(iris$Petal.Length, iris$Species, mean))

11. 함수부분 디테일 하게 보기

1) 결측치 제거

- c(1,2,NA,4)에서 결측치 제거

a <- c(1,2,NA,4)

a[!is.na(a)]

2) which, subset

- iris 데이터에서 Species가 'versicolor'인 행들의 인덱스 출력

which(iris$Species=='versicolor')

- iris 데이터에서 Species가 'versicolor'인 행들의 subset 생성

subset(iris, iris$Species == 'versicolor')

iris[which(iris$Species=='versicolor'),]

3) merge

4) cbind, rbind

5) sort

6) table(x) : 최빈값

'[2016 - 2019] 학부 정리 > 4학년' 카테고리의 다른 글

4학년 프로젝트 (0)	2018.06.10
[4학년-자격증] 2018 정보처리기사 합격 (0)	2018.06.06
[4학년 - 빅데이터기술] R프로그래밍 (0)	2018.04.22
[4학년 - 빅데이터기술] R프로그래밍 (0)	2018.04.22
[4학년 - 빅데이터기술] R프로그래밍 (0)	2018.04.21

비니닷컴

[4학년 - 빅데이터기술] R프로그래밍

'[2016 - 2019] 학부 정리 > 4학년' 카테고리의 다른 글

티스토리툴바

[4학년 - 빅데이터기술] R프로그래밍

'[2016 - 2019] 학부 정리 > 4학년' 카테고리의 다른 글

'[2016 - 2019] 학부 정리/4학년' Related Articles

티스토리툴바