Python List Remove Duplicates: Explanations and Examples
In this post, we'll cover six topics related to Python list remove duplicates, each of which is discussed below.
- Defining the function to remove all duplicates in a list
- In a nested list
- How to remove duplicates when randomizing a list
- Keep order during remove duplicates
- Remove duplicates and sort
- Remove duplicates in two lists
Each topic is briefly explained and illustrated with an example.
1. Defining a function to remove all duplicates from a list
To remove all duplicates in a Python list, we can use the set()
function.
Python's default data type, set, is special in that it does not allow duplicates of the items it contains.
If you're interested in learning more, check out the post Lists vs Tuples vs Dictionaries vs Sets.
The set()
function converts an iterable, such as a list, into a set and automatically removes duplicates.
You can use it to write a function that removes all duplicates in a list, like this
Let's look at the sample code
def remove_duplicates(lst):
return list(set(lst))
my_list = [1, 2, 3, 3, 4, 2, 1]
result = remove_duplicates(my_list)
print(result)
# Output
[1, 2, 3, 4]
The code above uses set()
to remove duplicates, then converts them back to a list with list()
and returns them.
One thing to note in this case is that the order of the existing list is not guaranteed. Since Python sets are an unordered data type, it is possible that the original order may not be preserved when converted back to a list.
If you want to preserve order when removing duplicates in lists, see Section 4.
2. Removing duplicates in nested list
Removing duplicates from a nested list in Python is a little different.
You can't use set()
to remove duplicates from a nested list, so you need to use a loop to remove duplicates.
The following is sample code for a function to remove duplicates from a nested list:
def remove_duplicates_nested(lst):
result = []
for sublist in lst:
if sublist not in result:
result.append(sublist)
return result
my_list = [[1, 2, 3], [3, 4, 5], [1, 2, 3], [6, 7, 8]]
result = remove_duplicates_nested(my_list)
print(result)
# Output
[[1, 2, 3], [3, 4, 5], [6, 7, 8]]
The code above creates an empty list result
and iterates over the double list lst
, checking if each sublist already exists in result
.
Only if it does not exist in result
will it be added to result
.
This removes duplicate sub-lists and finally returns a unique result.
3. How to remove duplicates when randomizing a list
You can randomly extract items from a Python list while removing duplicates by following the steps below.
- Create a new list with duplicates removed.
- Randomly extract items from the new list.
Here is some sample code to implement this
import random
def get_random_elements(lst, num_elements):
unique_elements = list(set(lst))
random_elements = random.sample(unique_elements, num_elements)
return random_elements
my_list = [1, 2, 3, 3, 4, 2, 1]
result = get_random_elements(my_list, 3)
print(result)
# Output
[1, 4, 2]
The code above uses set()
to remove duplicates, then converts to list()
to create a new list, unique_elements
.
We then use the random.sample()
function to generate random_elements
by randomly extracting num_elements
number of elements from unique_elements
.
It then returns random_elements
.
The random.sample()
function randomly selects a certain number of non-duplicate items from a list.
4. Keep order during Removing duplicates
The deduplication method using the set()
function results in ignoring the existing order of the list.
Let's see how to avoid this and preserve the order when deduplicating.
In Python, you can use the OrderedDict
class from the collections
module and list()
to remove duplicates from a list while preserving the order.
Here's a code example that uses them
from collections import OrderedDict
def remove_duplicates(lst):
return list(OrderedDict.fromkeys(lst))
my_list = [1, 2, 3, 3, 4, 2, 1]
result = remove_duplicates(my_list)
print(result)
# Output
[1, 2, 3, 4]
In the code above, the function OrderedDict.fromkeys()
is used to create an ordered dictionary with duplicates removed, which is then converted back to a list.
An OrderedDict
is a dictionary like class, similar to a dictionary, but keeping the order of the items.
So if you use the fromkeys()
method to create a dictionary with duplicates removed, the order of the items is preserved after the duplicates are removed.
We then use the list()
function to convert the dictionary back to a list.
5. Remove duplicates and sort list
To deduplicate and sort a list item in Python, you can do the following
- create a new list with the duplicates removed.
- sort the newly created list.
Here is some sample code to implement this.
def remove_duplicates_and_sort(lst):
unique_elements = list(set(lst))
sorted_elements = sorted(unique_elements)
return sorted_elements
my_list = [3, 2, 1, 4, 3, 2, 1]
result = remove_duplicates_and_sort(my_list)
print(result)
# Output
[1, 2, 3, 4]
In the code above, we use set()
to remove duplicates, then convert to list
to create a new list, unique_elements
.
We then use the sorted()
function to create a sorted_elements
that sorts unique_elements
in ascending order.
It then returns sorted_elements
.
The sorted()
function returns a new list sorted by the list. For more information, see the post Sorting lists.
6. How to deduplicate two lists
How to remove duplicates from two lists is covered in Section 4
of the post Join lists.
Please read that section.
Conclusion
In this post, I have tried to answer some of your questions about Python list remove duplicates.
Hopefully it will help you in your practical work.
