Chapter 3 Basic syntax

3.1 Python

3.1.1 Create and play with variables

Say, you want to create add two numbers. Sure, you definitely can do that:

2 + 3
## 5

This all is not too useful though. We are not using the perks of objects, and the statement above is not much better than having a calculator. Let’s create two objects and assign integers to them.

a = 2
b = 3
c = a + b
print(c)
## 5

The function print() here prints out a value of an object. Without that last line, all the operation would have still happened but we would never see the result. Sowe can see that we are operating on these objects. Let’s check what class the object c belong to. The function type() allows us to look up the class of an object:

type(c)
## <class 'int'>

There are not that many operations we can do with an integer. Let’s rewrite one of them into a float.

b = 3.5
c = a + b
print(c)
## 5.5

Now we can apply different functions to this object, for example, round it up, which returns an integer.

round(c)
## 6
type(round(c))
## <class 'int'>

You can name your objects with almost any name, except for the reserved ones (e.g., you can’t name an object True, because True is a Boolean value). Let’s see how to create objects of other classes.

x1 = False

Let’s check if x1 is False:

x1 == False
## True

So it is true that the object x1 is False.

False
## False

What about strings?

x2 = "False"
x2 == False
## False

It is not true now, because x2 is nothing but a string of characters and not a Boolean variable.

You can index a string, i.e., access a specific element (character) in a string: just like “H” is the first letter in “Hello”, “e” is the second, and so on. Be aware, however, that indexing in Python starts with zero, so that the first element has an index [0], the second – [1], the third – [2], and so on. You can also index a sequence from the end using a minus sign, so that the last character is indexed as [-1], the second to the last – [-2]. You can also specify a range of indexed values with a colon, so that [0:3] denotes characters from the first to the fourth. Indexing in Python is specified with square brackets:

x2
## 'False'
x2[0]
## 'F'
x2[1]
## 'a'
x2[2]
## 'l'
x2[-1]
## 'e'
x2[0:3]
## 'Fal'

In Python, the plus operator applied to strings concatenates them:

x2[0:3] + x2[1] + x2[0] + x2[4] + x2[2]
## 'FalaFel'

Similar rules apply to lists. You can index lists, e.g.,

x3 = list(x2)
x3
## ['F', 'a', 'l', 's', 'e']
x3[0:3]
## ['F', 'a', 'l']

Lists have a certain number of methods. One of the common ones is appending with another object. Notice how methods in Python are applied to objects with a period, so that you have object.method(argument).

x4 = [1, 2, 3, 4]
x4
## [1, 2, 3, 4]
x4.append(5)
x4
## [1, 2, 3, 4, 5]

Each class of objects has a specific list of methods that can be applied to such objects. You might need to read python documentation for different classes (e.g., here). You also can save an output of a method into another object:

x4.append(5)
x5 = x4.count(5)
x5
## 2

3.1.2 Operators

Objects in programming languages are often handled with certain operators. Here are the most common ones:

  • Assignment:

    • = – assign

    • += – add and assign (x += 1 is the same as x = x + 1)

  • Comparison:

    • == – check if equal (output is either True or False)

    • != – check if not equal

    • >, >=, <, <= – check if larger than, larger or equal to, less than, less or equal to

  • Arithmetic:

    • +, -, *, / – add, subtract, multiply, divide

    • ** – exponentiation

    • % – modulo (remainder from division)

    • // – floor division (integer of a division)

  • Logical:

    • and, or, not – used in Boolean expressions, e.g., to check if both expressions are satisfied (and), or either is satisfied (or)

    • is, is not – check whether objects are identical

  • Membership:

    • in, not in – checks if an element is a member of some list or other sequence

3.1.3 Comments

It may be useful to leave some verbal comments in your code that wouldn’t be read and executed by the computer. In Python, hash tag comments out a line of code or everything after it:

a = 2
b = 3
# b = 5000
a + b # still 5, so b was never assigned as 5000. just like this text
## [1] 5

3.1.4 Loops

Loops are used for iterations over a sequence, meaning that you go over each element and do something with it. for loops are more common and do something for each element, and while loops conduct iterations as long as some logical condition is satisfied. For example,

for i in x4:
  print(i)
## 1
## 2
## 3
## 4
## 5
## 5

does pretty much the following: for each element of x4, let’s call this element i, and then print this i.

In the while loop, on the other hand

number = 1 # initialize the variable
# you can explicitly define the class, in which case it would be
# number = int(1)

while number < 5:
  print(number)
  number += 1 # same as "number = number + 1"
## 1
## 2
## 3
## 4

which the computer understands as the following: create a variable number and assign it a value of 1; then run an iteration, at each step of which print out the value of the number and increase its value by one; this process is repeated as long as the statement number < 5 is True; once number becomes 5, this0 statement becomes False and the loop stops.

While loops are a bit tricky because you can accidentally end up with an infinite loop. For example,

while True:
  print("AAAAAA")

If you run it, your computer will never stop yelling at you because True is always True, and the while loop execute every next iteration.

3.1.5 If/else

What if you run a loop and want different outcomes under different conditions? That’s where you would use an if/else statement. For example, you for your sequence you can check if a number is odd or even:

for i in x4:
  if i % 2 == 0:
    print(str(i)+" is even");
  else:
    print(str(i)+" is odd")
## 1 is odd
## 2 is even
## 3 is odd
## 4 is even
## 5 is odd
## 5 is odd

Here, we apply a modulo operator % to our value i to check the reminder of dividing i/2 equals zero; if True, then i is even, else it is odd.

There could be more complex logical operations. What is the sequence represents something that we want to classify into three levels: low (0–1), medium (2–4), or high (5 and higher):

for i in x4:
  if i <= 1:
    print(str(i)+" is low");
  elif i > 1 and i <= 4:
    print(str(i)+" is medium");
  else:
    print(str(i)+" is high");
## 1 is low
## 2 is medium
## 3 is medium
## 4 is medium
## 5 is high
## 5 is high

where elif stands for a short “else if”.

3.1.6 Functions

Just like in math, a function is something that receives some input, or argument, and returns some input. For example, we want to write a function \(f(x) = x^2\):

def f(x):
  return x**2

def means “define a function”, which the computer will memorize as f, it has an argument x, and the function exponentiates the argument to the power of 2. You can write an output of a function into a different variable.

y = f(2)
y
## 4

You can apply functions in a loop, e.g.,

for i in x4:
  print(f(i))
## 1
## 4
## 9
## 16
## 25
## 25

A function can have several arguments. For example, what if we want to use power different from 2?

def exp(x, e = 2):
  return x**e

In that function, we defined a default option for argument e. Now, if the function doesn’t receive the value of this argument, it will always assume that e = 2.

It is a good idea to define the arguments, but otherwise Python will read them in the order they were written in the function. So that the code exp(2, 5) will generate the same output as exp(x = 2, e = 5).

Note that loops and Python in general are indent-sensitive: all statements in a loop or function should be indented in a stepwise manner. For example, this code will throw out an error because there is an indent missing:

def exp(x, e = 2):
return x**e
## expected an indented block after function definition on line 1 (<string>, line 2)

3.1.7 Libraries

Sometimes you want to do something, but there is no tool for it in the basic syntax and it is too hard to program on your own. Well, chances are, some programmers have already done that, packed their code into an archive somewhere online, and you can easily reuse their functions. Those archives are often called “packages”, “libraries”, or “modules”. To use a module, you need to download and build it on your machine (which usually is done with designated software), and load it into your memory before you use it. This way, you won’t need to re-type somebody else’s code in your script (because it can be thousands of lines long).

For example, I want to generate a random number. Though it seems simple, there is some math involved and I don’t want to waste time tackling this problem. Well there exists a Python module for random number generation called `random``.

Here is the part I don’t like too much compared to R: installing a module can be tricky, especially for a beginner. And if for some reason you need several modules that are interdependent, get ready to spend a day solving compatibility issues between them.

Anyway, you might need to install pip on your computer. This little guy, a package manager, is supposed to make it easier to install modules and avoid dependency errors. After installed, you should run pip through the command line like this (erm, it actually won’t work because random is comes with Python when you install the language, but this is the syntax you could use when installing a new package):

pip install random

Now, at the beginning of a script, we need to tell Python the module to load. After that is done, you can call objects from the module like module.object:

import random # load the module

random_list = [] # initialize the list

for i in range(10): # 10 iterations
  x = random.randint(0, 100) # create an intermediate random integer value between 0 and 11
  random_list.append(x) # append the list

print(random_list)
## [81, 58, 57, 78, 84, 91, 62, 94, 100, 32]

You can also use aliases in module names and their parts as following:

import random as rand

random_list = []

for i in range(10):
  x = rand.randint(0, 100)
  random_list.append(x)

print(random_list)
## [7, 43, 24, 76, 42, 31, 28, 79, 64, 34]

3.1.8 Lambda functions

Lambda functions are very simple and short syntax features that are useful for tasks where you don’t really need to write a whole function. Once you run a lambda function, it is deleted from your memory.

def big_function(a, b, add = True):
  if add:
    small_function = lambda a, b : a + b
  else:
    small_function = lambda a, b : a - b
  output = small_function(a, b)
  return output

In this function, we first check if the argument add is True (note that if add and if add == True will produce the same result since if add = True, then the statement add == True will produce True). Depending on it, we create a small_function which adds or subtracts the arguments. Then we just apply the small_function to the big_function’s arguments a and b without a need to define clumsy small_function with def and all that stuff.

3.1.9 List comprehension

List comprehension is an interesting technique that could be used to create sequences that follow a certain pattern without writing a clumsy for loop. For example, the statement

doubles1 = [x*2 for x in range(0, 10)]

produces the same output as

doubles2 = []

for x in range(0, 10):
  doubles2.append(x*2);

Let’s compare them:

doubles1
## [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
doubles2
## [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

You can also embed if statements in comprehended lists, e.g.,

[x**2 for x in range(0, 10) if x>3]
## [16, 25, 36, 49, 64, 81]

3.2 R

3.2.1 Create and play with variables

Python style of assigning values to objects with an equal sign (e.g., x = 1) is valid in R. However, it is far more popular (and easier to read) to use an “arrow” operator <-, which in RStudio can be done through a shortcut LAlt + - (left Alt and minus sign).

x <- 1
x
## [1] 1

Boolean values are written as TRUE/FALSE or abbreviated T/F:

FALSE == F
## [1] TRUE

So it is true that false is false.

Characters and numeric objects are created in the same way,

x1 <- 1.5
x2 <- "x2"

Sequences of objects are called vectors and are created using a function c()

v1 <- c(1, 2, 3, 4, 5)
v1
## [1] 1 2 3 4 5

You can also create sequences of integers using colon,

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

Or sequences of values using dedicated functions

seq(1, 5, 0.5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
seq(from = 1, to = 5, length.out = 4)
## [1] 1.000000 2.333333 3.666667 5.000000

Note that indexing in R, unlike Python, begins with 1:

v1[1]
## [1] 1
v1[1:3]
## [1] 1 2 3
v1[-4]
## [1] 1 2 3 5

Vectors can only contain objects of the same class. R will either throw an error or try to change all objects to one class if you try to violate this rule:

c(1, 2, "three")
## [1] "1"     "2"     "three"

The simplest combination of vectors is a matrix. It is the same as a matrix in math and you can do linear algebra in basic R.

matrix(data = 1:9, ncol = 3, nrow = 3, byrow = T)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Whatever is called lists in R is different from Python. It is more similar to JSON or XML kind of objects, and perhaps closer to Python dictionaries. It can be used to store different types of data with different dimensionality.

l1 <- list("first element" = T,
           "second element" = c(1, 2, 3, 4),
           "third element" = matrix(data = 1:9, ncol = 3, nrow = 3))
l1
## $`first element`
## [1] TRUE
## 
## $`second element`
## [1] 1 2 3 4
## 
## $`third element`
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Note that R is not an indent-sensitive language and you can hit the next line for better readability. It all gets executed as a single line:

l2 <- list("first element" = T, "second element" = c(1, 2, 3, 4), "third element" = matrix(data = 1:9, ncol = 3, nrow = 3))
l2
## $`first element`
## [1] TRUE
## 
## $`second element`
## [1] 1 2 3 4
## 
## $`third element`
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

You can index list elements by their index or name, e.g., (note that you have to use double brackets to index list elements)

l1[["first element"]]
## [1] TRUE
l1[[3]]
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
l1[[2]][3]
## [1] 3

3.2.2 Operators and comments

There are some differences in notation compared to the operators used in Python.

Note that R operators work on vectors. In all fairness, what I said earlier about R object types wasn’t exactly correct: all single-element objects are vectors, so it’s not like “a vector is a sequence of several (say) numeric objects”, but “a single numeric value is just a numeric vector of length 1”. So look what happens when a simple arithmetic operator is applied to vectors:

(-2:2) + (1:5)
## [1] -1  1  3  5  7

The operation is repeated for all elements of the vector.

  • Assignment:

    • = or <- – assign

    • there is also a right assignment -> but I have never used it

  • Comparison:

    • == – check if equal (output is either True or False)

    • != – check if not equal

    • >, >=, <, <= – check if larger than, larger or equal to, less than, less or equal to

  • Arithmetic:

    • +, -, *, / – add, subtract, multiply, divide

    • ^ or ** – exponentiation

    • %% – modulo (remainder from division)

    • %/% – floor division (integer of a division)

    • %*% – matrix multiplication

  • Logical:

    • & – vectorized AND operator

    • && – elemental AND operator, should only be used with single-element statements,

    • | and || – vectorized and elemental OR operator, see below what the difference is

  • Membership:

    • %in% – checks if an element is a member of a vector; there is no “not in” operator so I often define it in my scripts: '%!in%' <- function(x,y)!('%in%'(x,y))
  • Pipe:

    • %>% if you use tidyverse or |> in latest R distributions: the idea of pipe to make a statement where you apply a result of one function to another function more readable, so instead of code that reads function4(function3(function2(function1(x)))) you could write
x %>%
  function1() %>%
  function2() %>%
  function3() %>%
  function4()

Regarding the difference between vectorized and elemental logical operators, vectorized ones always apply to all elements of a vector, e.g.,

((-2:2) >= 0) & ((-2:2) <= 0)
## [1] FALSE FALSE  TRUE FALSE FALSE

First, it evaluates the statements ((-2:2) >= 0) (outputs F F T T T) and ((-2:2) <= 0) (outputs T T T F F), and then applies the & operator to all five pairs, yielding only one that is T and T in both vectors.

Now, the statement ((-2:2) >= 0) && ((-2:2) <= 0) will throw an error because && can only be used with two single-element objects. It can be useful when you must be sure that you check single logical values and if a vector creeps in your code, then this code won’t run.

As far as the comments, it’s same as in Python, everything between a # sign and the end of the line is commented out and is not executed by R.

3.2.3 Loops and if/else

Syntax is fairly similar to Python, but again, indentation is not important in R. Unlike Python, you are expected to separate the statement you iterate over with parentheses, while the iterative statement is contained within curly brackets.

odds <- numeric() # initialize empty numeric vector
evens <- numeric()

for (i in 1:10){
  if (i %% 2 == 0){
    odds <- c(odds, i) # there is no append command so you have to rewrite the object at each step
  }else{
    evens <- c(evens, i)
  }
}

odds
## [1]  2  4  6  8 10
evens
## [1] 1 3 5 7 9

A neat function in R that makes if/else functions easier is ifelse (just like in Excel), which lets you fit a whole statement in one line:

x <- 1
ifelse(x == 1, "x equals 1", "x isn't equal to 1")
## [1] "x equals 1"

3.2.4 Functions

Functions in R are defined as following:

e <- function(x, e = 2){
  return(x^e)
}

So it’s very similar to Python. You don’t even need to explicitly say what to return, just a statement would suffice, although typing return is a good habit: sometimes your function can analyze a huge dataset and return nothing because you didn’t explicitly wrote it to return anything! Also, with short functions, you don’t even need to use any curly brackets, so the function above can be stripped all the way to

e <- function(x, e = 2) x^e

3.2.5 Apply and such

apply is a family of functions that can be used to iterate a function over a multi-element object, just like list comprehension in Python (although apply can be much more complex). For example,

sapply(1:5, e)
## [1]  1  4  9 16 25

We just defined the function e above so that apply knows what it is, but more often you would write temporary functions directly in there (just like lambda functions in Python):

sapply(1:5, function(x){
  if (x %% 2 == 0){
    paste(x, "is even") # btw, + doesn't concatenate strings in R like in python
  }else{
    paste(x, "is odd")
  }
})
## [1] "1 is odd"  "2 is even" "3 is odd"  "4 is even" "5 is odd"

sapply is used for simple vectors, lapply – to iterate over a list, and apply – to two-dimensional datasets where you also have to specify a dimension (1 for row-wise operations, 2 – for column-vise), e.g.,

x <- matrix(1:9, nrow = 3, ncol = 3)
x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
apply(x, 1, sum)
## [1] 12 15 18
apply(x, 2, sum)
## [1]  6 15 24

3.2.6 Libraries

R libraries are the same as modules in Python. However, there is a central archive called CRAN for R libraries that is maintained by the same people who develop and maintain the R language. So you don’t need any external pips and such, just type directly in R language install.packages("YOUR-PACKAGE-NAME") and it will get installed on your computer. At the beginning of each session, you have to call the package you want to use with a statement library(yourpackagename). Note that we use quotes for install.packages but not in library.

CRAN has quite strict requirements to packages, and some packages can be found elsewhere, e.g., GitHub. For example, in my undergrad I developed some utilities for my work that really are redundant relative to big packages like vegan, that is very popular in ecology. You still can install such packages using another package, devtools:

install.packages("devtools") # remember that you only call this once to install it on your computer
library(devtools) # but this one you call every time you start a new R session
install_github("OleksiiDubovyk/ancends") # also only call once
library(ancends) # and here is my baby

3.3 Exercise

3.3.1 Exercise 1

You are going shopping for groceries, and you have a shopping list on you. Once you arrive to the store, you get a text from your partner with some more stuff to buy. Merge these two lists using different approaches: (1) in a single line of code, (2) using a for loop, and (3) using Python’s list comprehension or lambda functions, or R’s apply loops.

Just to get you started, here are the lists. Define your R vector or Python list objects using appropriate syntax.

your_list contains "apples", "oranges", "bananas", "milk"

partner_list contains "flour", "eggs", "potatoes", "cookies", "tea"

You want to end up with an updated your_list containing "apples", "oranges", "bananas", "milk", "flour", "eggs", "potatoes", "cookies", "tea".

3.3.2 Exercise 2

Learn how to deal with such sequences using the example of the updated your_list:

  1. Try indexing to return the first 3 and the last 3 elements in the list.

  2. Return only the elements that are 5 characters or longer (hint: you can get the string length using len("text") in Python or str_length("text") in R).

3.3.3 Exercise 3

Create a function that receives an argument of a numeric value and returns “odd” if the number is odd or “even” if the number is, well, even.

Hint: you might need to use a modulo operator: odd_number %% 2 will return 0, even_number %% 2 will return 1.

Once you get there, to make it even cooler, make this function return a text. E.g., if I call yourfunction(2), it should return "2 is even".

3.3.4 Exercise 4

Virginia state tax depends on the income. If you make between $0 and $3,000, your tax is 2%; at $3,001 – $5,000 you get taxed $60 plus 3% of any excess over $3,000; at $5,001 – $17,000 you get taxed $120 plus 5% of any excess over $3,000; and if you make more than $17,001, the state will tax you $720 plus 5.75% of any excess over $17,000.

For example, if one makes $23,456, then they get taxed $720 plus \(0.0575 \cdot (23456 - 17000)\), which is 5.75% of $6,456, totaling $371.22. So the total tax will be $720 + $371.22 = $1091.22

Your task is, you guessed it, to write a function that receives the annual income and returns the tax owed.

3.3.5 Exercise 5

Now, finally, ecological stuff. If you want to describe biological diversity in a community, the easiest way is to count the number of species. For example, imagine a community where you have 2 Carolina Chickadees, 5 American Goldfinches, 3 American Robins, 1 American Crow, 1 House Finch, and 1 Cooper’s Hawk. The total species richness is 6 species, but it doesn’t reflect the distribution of species abundances (e.g., we could have had 1 individual for each species, but still would end up with the species richness of 6). For that, there exist some indices of diversity, for example, Shannon index:

\[- \sum \limits_{i=1}^S \frac{n_i}{N} \cdot \ln \left( \frac{n_i}{N} \right)\]

where \(N\) is the total number of individuals of all species. Here, you have to calculate the ratio of each \(i\)th species in the community of \(S\) species and estimate the logarithm of those ratios. For our example, we have \(S = 6\), \(N = 2 + 5 + 3 + 1 + 1 + 1= 13\). For Carolina Chickadee, \(n_i/N = 2/13 = 0.153\), for American Goldfinch – \(n_i/N = 5/13 = 0.385\), for American Robin – \(n_i/N = 3/13 = 0.231\), and for everyone else \(n_i/N = 1/13 = 0.077\). Now we need to find logarithms of those values: \(\ln (1/13) = -2.57\), \(\ln (2/13) = -1.87\), \(\ln (3/13) = -1.47\), and \(\ln (5/13) = -0.96\). Now we just multiply \((1/13)\cdot\ln(1/13) = -0.197\), \((2/13)\cdot\ln(2/13) = -0.288\), \((3/13)\cdot\ln(3/13) = -0.338\), and \((5/13)\cdot\ln(5/13) = -0.367\). All we have left is to take a sum of all those values: \((-0.197) + (-0.197) + (-0.197) + (-0.288) + (-0.338) + (-0.367) = -1.584\). And our final answer is to take a negative of that, so that Shannon index for a vector of abundances \((1,1,1,2,3,5)\) would be \(1.584\).

To calculate logarithms in Python, you need to call module math and use a function math.log. In R, it’s just a built-in function log. Now, write a function that takes an argument of a sequence of numbers and returns Shannon index.