Finding strings with common character

6 次查看(过去 30 天)
I would like to find in a text file all unique strings with common first character, e.g. "G" (unique i.e. without repetition: if any, the same, string occurs several tims I need to specify/print it only once.
Any help would be appreciated.
  1 个评论
madhan ravi
madhan ravi 2023-12-22
编辑:madhan ravi 2023-12-22
Give an example or attach your text file and show the expected result.

请先登录,再进行评论。

采纳的回答

Hassaan
Hassaan 2023-12-22
编辑:Hassaan 2023-12-22
You can use a regular expression to separate the strings and then filter out the unique ones that start with 'G'.
% Specify the file name and the common character
filename = 'yourfile.txt'; % Replace with your text file name
commonChar = 'G'; % Replace with the common character you're looking for
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the entire file content as a single string
fileContent = fscanf(fileID, '%c');
fclose(fileID); % Close the file after reading
% Use regular expression to separate strings that start with 'G'
pattern = ['\' commonChar '\w*'];
allMatches = regexp(fileContent, pattern, 'match');
% Find unique strings
uniqueStrings = unique(allMatches);
% Print the unique strings
disp(['Unique strings starting with the character ' commonChar ':']);
for i = 1:length(uniqueStrings)
disp(uniqueStrings{i});
end
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
Output:
Unique strings starting with the character G:
G123
G123Gabc
G321Yo
Gabc
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
  2 个评论
Hassaan
Hassaan 2023-12-22
One of the many approaches without using regexp:
% The character to search for
searchChar = 'G';
% Specify the file name
filename = 'code.txt'; % Replace with your text file name
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the entire file content as a single string
fileContent = fscanf(fileID, '%c');
fclose(fileID); % Close the file after reading
% Remove newlines and carriage returns
fileContent = strrep(fileContent, newline, '');
fileContent = strrep(fileContent, char(13), ''); % Carriage return
% Split the text into individual words assuming 'G' is the delimiter
words = strsplit(fileContent, searchChar);
% Reattach 'G' to the start of each non-empty word
words = words(~cellfun('isempty', words));
words = strcat(searchChar, words);
% Find unique words that start with 'G'
uniqueWords = unique(words);
% Print the unique strings
disp(['Unique strings starting with the character ' searchChar ':']);
disp(uniqueWords);
% Print the unique strings
disp(['Unique strings starting with the character ' commonChar ':']);
for i = 1:length(uniqueWords)
disp(uniqueWords{i});
end
This approach will filter the words that start with the searchChar and remove any empty entries that result from the strsplit. Then, it finds the unique words and prints them out. Make sure to adjust the filename to the actual file you're reading from.
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
G123GabcYo123G321Yo
Output
Unique strings starting with the character G:
G123
G321Yo
Gabc
GabcYo123
Gabcabcde
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
UWM
UWM 2023-12-22
Thank you very much for help. Works perfectly.

请先登录,再进行评论。

更多回答(3 个)

Steven Lord
Steven Lord 2023-12-22
Read the data into MATLAB, split it into separate words if necessary, then use startsWith to determine which words start with your desired character.
L = readlines('bench.dat');
oneLine = L(1) % Just operate on the first line
oneLine = "MATLAB(R) Benchmark Data."
s = split(oneLine)
s = 3×1 string array
"MATLAB(R)" "Benchmark" "Data."
startsWithB = startsWith(s, "B")
startsWithB = 3×1 logical array
0 1 0
wordStartingWithB = s(startsWithB)
wordStartingWithB = "Benchmark"
The unique function likely will be useful to you as well.

Hassaan
Hassaan 2023-12-22
编辑:Hassaan 2023-12-22
To achieve this in MATLAB, you would typically read the text file into a string array or cell array, then use string manipulation functions to find and list the unique strings. Here's a step-by-step guide with code snippets:
Read the Text File: Load the contents of the text file into MATLAB.
filename = 'yourfile.txt'; % Replace with your text file name
fileID = fopen(filename, 'r');
data = textscan(fileID, '%s');
fclose(fileID);
extractedStrings = data{1};
Filter Strings by First Character: Find strings that start with the specified character.
commonChar = 'G'; % Replace with the common character you're looking for
startsWithG = strncmp(extractedStrings, commonChar, 1);
filteredStrings = extractedStrings(startsWithG);
Find Unique Strings: Get the unique strings from the filtered list.
uniqueStrings = unique(filteredStrings);
Print Unique Strings: Display or print the unique strings.
disp(uniqueStrings);
On MATLAB, you can run this script after replacing 'yourfile.txt' with the actual path to your text file and commonChar with the character you're interested in. This will print all unique strings that start with that character, displaying each string only once.
Full Code:
% Specify the file name and the common character
filename = 'yourfile.txt'; % Replace with your text file name
commonChar = 'G'; % Replace with the common character you're looking for
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the content of the file into a cell array of strings
data = textscan(fileID, '%s');
fclose(fileID); % Close the file after reading
extractedStrings = data{1}; % Extract the strings from the cell array
% Filter strings by the first character
startsWithCommonChar = strncmp(extractedStrings, commonChar, 1);
% Get the unique strings that start with the specified character
filteredStrings = extractedStrings(startsWithCommonChar);
uniqueStrings = unique(filteredStrings);
% Print the unique strings
disp('Unique strings starting with the specified character:');
disp(uniqueStrings);
Input file content:
Gabc
abcde
G123
Output:
Unique strings starting with the specified character:
{'G123'}
{'Gabc'}
For instance, if you need the output as a simple list without the curly braces and single quotes, you can loop through the cell array and print each string:
disp('Unique strings starting with the character G:');
for i = 1:length(uniqueStrings)
disp(uniqueStrings{i});
end
Input file content:
Gabc
abcde
G123
Output:
Unique strings starting with the specified character:
G123
Gabc
-----------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
  4 个评论
Dyuman Joshi
Dyuman Joshi 2023-12-22
You've only updated for the 2nd point I raised.
Say the input is -
G123Gabc
Yo123G321Yo
What should be the output then?
Hassaan
Hassaan 2023-12-22
编辑:Hassaan 2023-12-22
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
Output:
Unique strings starting with the character G:
G123
G123Gabc
G321Yo
Gabc
Provided the new code snippet below as a new answer.

请先登录,再进行评论。


Paul
Paul 2023-12-22
type Gfile.txt
Gabc abc abc Gabc Gdef Gdef Gabc GGG Gxyz
% assuming strings to return are space delimited
text = split(string(fileread('Gfile.txt')));
unique(text(startsWith(text,"G")))
ans = 4×1 string array
"GGG" "Gabc" "Gdef" "Gxyz"

类别

Help CenterFile Exchange 中查找有关 Entering Commands 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by