# String Operations - Transforming lower case letters to upper case between certain symbols

• Python
Gold Member
Let us suppose I have a string in this form,

string = 'CıCCkCnow CwCho CyCou CaCre but CwChat CaCm CıC'

Now I won't to take each word between 'C' and convert it into an upper case letter. For example, the above string should turn into

new_string = 'I know Who You Are but What Am I'

what kind of algorithm is best for this job ? I have come up with something but it seems really long and inefficient. Any ideas ?

Last edited by a moderator:

Mentor
I have come up with something but it seems really long and inefficient.
It's going to be hard for us to tell whether or not we agree with you if we don't see the algorithm.

Homework Helper
Gold Member
The 'sub' function in Python allows you to have it call a function for each string that matches the pattern and return the string that you want to be substituted. That is what you want so that you can replace 'CxC' with 'X'. See this for a description.

sysprog and Arman777
Gold Member
It's going to be hard for us to tell whether or not we agree with you if we don't see the algorithm.
My algorithm was to take the index of each C letter. Pair them as 2. Get index values between them and then turn strings into uppercase letters based on these index. However, it is taking too long...

Gold Member
The 'sub' function in Python allows you to have it call a function for each string that matches the pattern and return the string that you want to be substituted. That is what you want so that you can replace 'CxC' with 'X'. See this for a description.
It looks good. I'll try this.

Staff Emeritus
regex 2021.8.3 is a python package that you can install and use.

sysprog
Homework Helper
Gold Member
The hardest part, in general, would be to distinguish between a 'CxC' pattern that you want to replace versus an acronym with two 'C's that should stay as is. IMO, it is a mistake to use a normal ASCII character like 'C' as a special non-ASCII indicator with a special meaning.
(Also document section headers are sometimes all capitalized and might have character patterns that you don't want to replace.)

jack action and sysprog
Homework Helper
Gold Member
Python is not known for speed. I assume that your algorithm is fairly simple and that Python is just slow. I think there are ways to pre-compile Python so that it will be faster. If this program is to be used for large quantities of text processing, you might want to do that part with a separate program in a faster language. If this is to be used for large text documents, you might be surprised at how many things occur in text documents that require more logic than you anticipated.

Gold Member
The hardest part, in general, would be to distinguish between a 'CxC' pattern that you want to replace versus an acronym with two 'C's that should stay as is. IMO, it is a mistake to use a normal ASCII character like 'C' as a special non-ASCII indicator with a special meaning.
(Also document section headers are sometimes all capitalized and might have character patterns that you don't want to replace.)
Thats no problem for my case. All strings that I will work are lowercase.

Gold Member
Python is not known for speed. I assume that your algorithm is fairly simple and that Python is just slow. I think there are ways to pre-compile Python so that it will be faster. If this program is to be used for large quantities of text processing, you might want to do that part with a separate program in a faster language. If this is to be used for large text documents, you might be surprised at how many things occur in text documents that require more logic than you anticipated.
Well I don't know any language other then python..and I kind of need it to work on python. But the length of the text will not be much longer...so it won't be a problem

Gold Member
I tried to use

Code:
def my_replace(m):
if <some condition>:
return <replacement variant 1>
return <replacement variant 2>

result = re.sub("\w+", my_replace, input)

but I couldn't make it work..any ideas ?

Homework Helper
Gold Member
I tried to use

Code:
def my_replace(m):
if <some condition>:
return <replacement variant 1>
return <replacement variant 2>

result = re.sub("\w+", my_replace, input)

but I couldn't make it work..any ideas ?
This is just pseudocode. You need to replace it with real Python code appropriate for your problem. Exactly what code did you try?

Gold Member
This is just pseudocode. You need to replace it with real Python code appropriate for your problem.
Yes indeed. I just don't know how to use re module. It says I can take a function but I am not sure how to use that function.
Exactly what code did you try?
Not worth sharing since its not useful

Homework Helper
Gold Member
I'm starting to suspect that this is a Python homework problem because it seems very artificial. In that case, I will only give hints on how to modify your Python code.

In case it is not a Python homework problem, below is some Perl code that will work. Put the original text in the file temp2.txt and it will print the modified result to STDOUT.

Perl:
$string = type temp2.txt;$string =~ s/(C(\w)C)/uc($2)/ge; print "$string\n";

Gold Member
This is what you want:

Python:
import re

def to_camel_case(match):
if match.group(1) is not None:
return match.group(1).upper()

old_str = 'CıC CkCnow CwCho CyCou CaCre but CwChat CaCm CıC'
new_str = re.sub(r"C([^C])C", to_camel_case, old_str)

print(new_str)

I will leave it as an exercise for you to understand how it works.

But this is a much more useful (and fun!) use of regular expressions and python (note that there are no 'C' in the original string):

Python:
import re

def to_camel_case(match):
if match.group(2) is not None:
if match.group(2) not in ['but', 'and', 'of']:
return  match.group(1) + match.group(3).upper() + match.group(4)
else:
return match.group(1) + match.group(2)

old_str = 'ı know who you are but what am ı'
new_str = re.sub(r"(^|[\s.,;:!?()])(([^\s.,;:!?()])([^\s.,;:!?()]*))(?=$|[\s.,;:!?()])", to_camel_case, old_str) print(new_str) If I was more fluent in python, I could make a better regular expression than that, but re seems to use a limited version. The best tool to learn about regular expression is regex101.com. FactChecker and Arman777 Gold Member I'm starting to suspect that this is a Python homework problem because it seems very artificial. Well I am going to use it somewhere but its not homework. In case it is not a Python homework problem, below is some Perl code that will work. Put the original text in the file temp2.txt and it will print the modified result to STDOUT. I did not ask for a perl code. I don't know PERL or how to run it. Gold Member This is what you want: Python: import re def to_camel_case(match): if match.group(1) is not None: return match.group(1).upper() old_str = 'CıC CkCnow CwCho CyCou CaCre but CwChat CaCm CıC' new_str = re.sub(r"C([^C])C", to_camel_case, old_str) print(new_str) I will leave it as an exercise for you to understand how it works. But this is a much more useful (and fun!) use of regular expressions and python (note that there are no 'C' in the original string): Python: import re def to_camel_case(match): if match.group(2) is not None: if match.group(2) not in ['but', 'and', 'of']: return match.group(1) + match.group(3).upper() + match.group(4) else: return match.group(1) + match.group(2) old_str = 'ı know who you are but what am ı' new_str = re.sub(r"(^|[\s.,;:!?()])(([^\s.,;:!?()])([^\s.,;:!?()]*))(?=$|[\s.,;:!?()])", to_camel_case, old_str)

print(new_str)

If I was more fluent in python, I could make a better regular expression than that, but re seems to use a limited version. The best tool to learn about regular expression is regex101.com.
Its nice but does not work for this case

xstr = 'CaCarmanCpopC'

It should have produce

xstr = 'AarmanPOP

AarmanCpopC

so it ignores other C values.

Mentor
My algorithm was to take the index of each C letter. Pair them as 2. Get index values between them and then turn strings into uppercase letters based on these index.
That's basically what the regex version is doing.

However, it is taking too long...
That's because Python is doing it in bytecode instructions, whereas the regex version is using the underlying C implementation for regular expressions, which will be a lot faster. But the algorithm itself is basically the same either way. There's no magical shortcut to finding the "C"s and uppercasing the letters between them.

Mentor
I think there are ways to pre-compile Python so that it will be faster.
If you're running Python bytecode, you're running Python bytecode. "Pre-compiling", for Python, just means compiling Python source code to bytecode in advance. That won't make much difference compared to the overhead of bytecode while actually running the algorithm.

There is the option of trying other interpreters, such as PyPy, that use various tricks to optimize how Python bytecode is run. For this problem, with a short string, that probably won't do much; but for a very large body of text, it might since the PyPy optimizer will have more opportunities to optimize.

Gold Member
That's basically what the regex version is doing.
But my code takes more then 10-20 lines regex might take 5 lines maybe less

That's because Python is doing it in bytecode instructions, whereas the regex version is using the underlying C implementation for regular expressions, which will be a lot faster. But the algorithm itself is basically the same either way. There's no magical shortcut to finding the "C"s and uppercasing the letters between them.
I did not mean in terms of speed of the running time but in terms of me writing the code :)

Code:
re.sub('C(\w)C', lambda s: s.group(1).upper(), xstr)
This seems to be working, but it has the problem that, if the C has multiple values it fails. Such as,

For

xstr = 'CaCCvaC'

the above code produces

a = 'ACvaC'

but it should produce AVA.

Homework Helper
Gold Member
Its nice but does not work for this case

xstr = 'CaCarmanCpopC'

It should have produce

xstr = 'AarmanPOP

AarmanCpopC

so it ignores other C values.
Your example didn't have anything with multiple letters between the 'C's.
In line 8 try new_str = re.sub(r"C([^C]+)C", to_camel_case, old_str)
or new_str = re.sub(r"C(\w+)C", to_camel_case, old_str)
Unfortunately, this will be fooled by any pair of 'C's that are part of real words. So it is most useful if there are no capital letters in the real text.
You may need to get familiar with Python regular expressions and try some things to get it to work the way you want it to.

jack action
Gold Member
Your example didn't have anything with multiple letters between the 'C's.
In line 8 try new_str = re.sub(r"C([^C]+)C", to_camel_case, old_str)
or new_str = re.sub(r"C(\w+)C", to_camel_case, old_str)
Unfortunately, this will be fooled by any pair of 'C's that are part of real words. So it is most useful if there are no capital letters in the real text.
You may need to get familiar with Python regular expressions and try some things to get it to work the way you want it to.
Guys please. As I have said earlier. In the text I am working on there are no capital letters. So there will be no uppercase C.
All strings that I will work are lowercase.

Code:
re.sub('C(\w+?)C', lambda s: s.group(1).upper(), xstr)

This code works

have anything with multiple letters between the 'C's.
CpopC was the case

You guys are really helpful, but sometimes I just need some spesific things. I know what I am doing. I know the difference between capital C and lowercase C and how can the code mix them up. Maybe I am 'new' in coding but I know that much.

But my code takes more then 10-20 lines regex might take 5 lines maybe less
We have seen that it takes only 1 line :)

Mentor
my code takes more then 10-20 lines regex might take 5 lines maybe less
Yes, because the regex version already has built-in functions that perform the operations you need, so you don't have to code them by hand.

I did not mean in terms of speed of the running time but in terms of me writing the code :)
Yes, I agree that's important. I've found that a great source of innovation in coding is programmer laziness.

Mentor
regex 2021.8.3 is a python package that you can install and use.
Python already has the built-in re module in the standard library.

Arman777
Mentor
This code works
As long as you're sure the characters in between the C's will all be lower case letters, yes. You could also make the regex more specific for that:

Python:
re.sub('C([a-z]+)C', lambda s: s.group(1).upper(), xstr)

Also, as shown in the example above, if you're sure there will be at least one lower case letter in between each pair of C's, you don't need the question mark in the regex, just the plus sign.

Arman777
Homework Helper
Gold Member
Guys please. As I have said earlier. In the text I am working on there are no capital letters. So there will be no uppercase C.

Code:
re.sub('C(\w+?)C', lambda s: s.group(1).upper(), xstr)

This code works

CpopC was the case

You guys are really helpful, but sometimes I just need some spesific things. I know what I am doing. I know the difference between capital C and lowercase C and how can the code mix them up. Maybe I am 'new' in coding but I know that much.

We have seen that it takes only 1 line :)
Sorry. You will get the best help if you are careful about the initial statement of the problem. The information about no capital letters and the example with more than one letter between the 'C's was not in the first post. It is hard for me to keep up with all the posts to get a clear picture of what is needed.

Gold Member
Could just get your keyboard fixed.