All Questions
1,084
questions
0
votes
1
answer
47
views
Pyspark Filtering Array inside a Struct column
I have a column in my Spark DataFrame that has this schema:
root
|-- my_feature_name: struct (nullable = true)
| |-- first_profiles: map (nullable = true)
| | |-- key: string
| | |--...
1
vote
2
answers
40
views
How to keep the first appearance of a value while filtering everything else out in R?
This is the appearance of my dataset currently. I want to include patient 1 data until the first '1' occurs in 'test.result' then remove any information about patient 1 after that.
current dataset
...
1
vote
2
answers
42
views
How to get column-wise summary statistics with missing codes?
I have written a custom function ord_table() to extract summary statistics from a series of databases. To get those summary statistics, I have to filter out missing data codes (all codes are large ...
1
vote
1
answer
59
views
Conditional filtering of dataframe in R
I wonder how to dplyr::filter() my DATA to catch the rows for IDs whose Language value when 'Type!=5F' and when 'Type==5F' changes from other languages to "English"?
For example, ID==1 has ...
0
votes
1
answer
45
views
How to write a function to read csv files with different separators in pandas in Python?
I have a bunch of CSV files for different years named my_file_2019, my_file_2020, my_file_2023 and so on. Some files have tab separator while others have semi-colon.
I want to write a common function ...
2
votes
4
answers
92
views
Filter DataFrame events not in time windows DataFrame
I have a DataFrame of events (Event Name - Time) and a DataFrame of time windows (Start Time - End Time).
I want to get a DataFrame containing only the events not in any of the time windows.
I am ...
1
vote
3
answers
46
views
Finding rows in a data.frame that are the same on one variable but different on another variable in R
In my DATA below, how could I filter the rows where the Nm values are the same but Descr values are different to achieve my Desired_out below?
DATA <- read.table(header=T, text ="
Cd Nm ...
1
vote
3
answers
82
views
Remove elements with same prefix from a string in a column
and thank you for being part of an awesome community of learners and explorers! :) I have an issue with removing elements from a character column. This answer pointed me in a correct direction (split ...
1
vote
5
answers
79
views
How to filter out the top Nth value of each row in a R dataframe?
There is a R dataframe, I want to filter out the top Nth value of each row with the rest values being set to 0.
For example, suppose the dataframe looks like the following table:
0.5 0.3 0.2 0.15 0.9
...
0
votes
2
answers
37
views
Pandas - Filter - TypeError: 'in <string>' requires string as left operand, not list
I am exploring Pandas filter and while doing so I came across this error while using the below query
df2.filter(like = ['Republic','United'], axis=0 )
How do I provide a list in Like parameter in ...
2
votes
3
answers
69
views
Extracting entries in a dataframe corresponding to n smallest positive values and n largest negative values of a certain variable in r
Imagine I have a table like the following one.
set.seed(12)
table =
data.frame(
value = rnorm(n = 10),
par = runif(n = 10, min = - 1, max = 1)
)
How can I extract the entries of value ...
0
votes
1
answer
43
views
Filtering Dataframe for first letter of a Classification Code in a column entry that has multiple of them
I am trying to filter a Dataframe of patens into their classification codes. I want to only get the patents that have a specific first letter in the code, but each column entry has multiple of these ...
0
votes
0
answers
53
views
Pandas filtering yields null values
I have a df which I am filtering based on a particular column. This is my code to do that
test_table[(test_table['Item']=='1')]
Ideally this code should return the rows where the value of column '...
1
vote
0
answers
30
views
Is there another way to select rows with null values within a function in python? [duplicate]
I've been working on cleaning a dataset and so far I've been using
missing_df = main_df[main_df.col.isnull()]
to select rows with null values under 'col'.
Since I'm using this a lot I'm trying to put ...
3
votes
1
answer
55
views
How to filter dataframe column names containing 2 specified substrings?
I need the column names from the dataframe that contain both the term software and packages.
I'm able to filter out columns containing one string.. for eg:
software_cols = df.filter(regex='Software|...