Time Complexity of dplyr functions

Trollfaz · May 30, 2023

For R's dplyr package this is my query.
Suppose I have a data frame/tibble of n observations or n rows. Let's call it df1. Is the time complexity for dplyr's basic manipulation functions O(N)
filter()
select()
mutate() assuming mutate is O(1)
rename()
summarize()
count()
separate()
unite()
spread()
gather()
If I have another data frame/tibble df2 of m rows, then are the following functions of time complexity O(N+M)
inner_join(df1,df2)
right/left_join(df1,df2)
outer_join(df1,df2)

Adrmay · May 30, 2023

Yes, the time complexity for dplyr's basic manipulation functions is O(N). filter(), select(), mutate(), rename(), summarize(), count(), separate(), unite(), spread(), and gather() all have a time complexity of O(N). The inner_join(), right/left_join(), and outer_join() functions are of time complexity O(N+M), since they involve combining two data frames of different sizes.

Time Complexity of dplyr functions

1. What is the time complexity of dplyr functions?

2. How does the time complexity of dplyr functions compare to other data manipulation tools?

3. Are there any dplyr functions with a higher time complexity?

4. How can I improve the time complexity of dplyr functions?

5. Is the time complexity of dplyr functions affected by the type of data being manipulated?

Similar threads

Hot Threads

Recent Insights