Text Processing
Developing a word processor, or other text based app, involves handling text, GUI (Graphical User Interface), file operations, and possibly networking for cloud features. Boost does not provide a library for creating a GUI. You may want to consider using a library like Qt or wxWidgets for the GUI part of your word processor.
Libraries
Here are some Boost libraries that might assist you in processing text:
-
Boost.Regex: For some simpler parsing tasks, regular expressions can be sufficient and easier to use than full-blown parsing libraries. You could use these features to match specific patterns in your input text, like specific commands or phrases, word boundaries, etc.
-
Boost.Locale : This library provides a way of handling and manipulating text in a culturally-aware manner. It provides localization and internationalization facilities, allowing your word processor to be used by people with different languages and locales.
-
Boost.Spirit : This library is a parser framework that can parse complex data structures. If you’re creating a word processor, it could be useful to interpret different markup and file formats.
-
Boost.DateTime : If you need to timestamp changes or edits, or if you’re implementing any kind of version history feature, this library can help.
-
Boost.Filesystem : This library provides a way of manipulating files and directories. This would be critical in a word processor for opening, saving, and managing documents.
-
Boost.Asio : If your word processor has network-related features, such as real-time collaboration or cloud-based storage, Boost.Asio provides a consistent asynchronous model for network programming.
-
Boost.Serialization : This library provides a way of serializing and deserializing data, which could be useful for saving and loading documents in a specific format.
-
Boost.Xpressive : Could be useful for implementing features like search and replace, spell-checking, and more.
-
Boost.Algorithm : This library includes a variety of algorithms for string and sequence processing, which can be useful for handling text.
-
Boost.MultiIndex : This library provides a way of maintaining a set of items sorted according to multiple keys, which could be useful for implementing features like an index or a sorted list of items.
-
Boost.Thread : If your application is multithreaded (for example, if you want to save a document while the user continues to work), this library will be useful.
- Note
-
The code in this tutorial was written and tested using Microsoft Visual Studio (Visual C++ 2022, Console App project) with Boost version 1.88.0.
Sample of Regular Expression Parsing
If the text you are parsing is well-formatted then you can use Boost.Regex which we will base our sample on here, rather than a full-blown parser implementation using Boost.Spirit.
We’ll write a program that scans a string for dates in the format "YYYY-MM-DD" and validates them. The code:
-
Finds dates in text
-
Validates correct formats (for example, 2024-02-20 is valid, but 2024-15-45 is not)
-
Handles multiple dates in a single input string
#include <iostream>
#include <boost/regex.hpp>
#include <boost/algorithm/string.hpp>
// Function to check if a given date is valid (basic validation)
bool is_valid_date(int year, int month, int day) {
if (month < 1 || month > 12 || day < 1 || day > 31) return false;
if ((month == 4 || month == 6 || month == 9 || month == 11) && day > 30) return false;
if (month == 2) {
bool leap = (year % 4 == 0 && year % 100 != 0) || (year % 400 == 0);
if (day > (leap ? 29 : 28)) return false;
}
return true;
}
// Function to find and validate dates in a text
void find_dates(const std::string& text) {
// Regex pattern: YYYY-MM-DD format
boost::regex date_pattern(R"((\d{4})-(\d{2})-(\d{2}))");
boost::smatch match;
std::string::const_iterator start = text.begin();
std::string::const_iterator end = text.end();
bool found = false;
while (boost::regex_search(start, end, match, date_pattern)) {
int year = std::stoi(match[1]);
int month = std::stoi(match[2]);
int day = std::stoi(match[3]);
if (is_valid_date(year, month, day)) {
std::cout << "Valid date found: " << match[0] << "\n";
} else {
std::cout << "Invalid date: " << match[0] << " (Incorrect month/day)\n";
}
start = match[0].second; // Move to next match
found = true;
}
if (!found) {
std::cout << "No valid dates found in the input text.\n";
}
}
int main() {
std::string input;
std::cout << "Enter a sentence containing dates (YYYY-MM-DD format):\n";
std::getline(std::cin, input);
find_dates(input);
return 0;
}
The following shows a successful parse:
Enter a sentence containing dates (YYYY-MM-DD format):
Today is 2024-02-19, and tomorrow is 2024-02-20.
Valid date found: 2024-02-19
Valid date found: 2024-02-20
And the following shows several unsuccessful parses:
Enter a sentence containing dates (YYYY-MM-DD format):
The deadline is 2024-02-30.
Invalid date: 2024-02-30 (Incorrect month/day)
Enter a sentence containing dates (YYYY-MM-DD format):
There are no dates in this sentence.
No valid dates found in the input text.
Add Robust Date and Time Parsing
The clunky date validation in the sample above can be improved by integrating Boost.DateTime, which provides functions for handling dates and times correctly.
#include <boost/regex.hpp>
#include <boost/date_time/gregorian/gregorian.hpp>
namespace greg = boost::gregorian;
// Function to check if a date is valid using Boost.Date_Time
bool is_valid_date(int year, int month, int day) {
try {
greg::date test_date(year, month, day);
return true; // If no exception, it's valid
}
catch (const std::exception& e) {
return false; // Invalid date
}
}
// Function to find and validate dates in a text
void find_dates(const std::string& text) {
boost::regex date_pattern(R"((\d{4})-(\d{2})-(\d{2}))");
boost::smatch match;
std::string::const_iterator start = text.begin();
std::string::const_iterator end = text.end();
bool found = false;
while (boost::regex_search(start, end, match, date_pattern)) {
int year = std::stoi(match[1]);
int month = std::stoi(match[2]);
int day = std::stoi(match[3]);
if (is_valid_date(year, month, day)) {
greg::date valid_date(year, month, day);
std::cout << "Valid date found: " << valid_date << "\n";
}
else {
std::cout << "Invalid date: " << match[0] << " (Does not exist)\n";
}
start = match[0].second; // Move to next match
found = true;
}
if (!found) {
std::cout << "No valid dates found in the input text.\n";
}
}
int main() {
std::string input;
std::cout << "Enter a sentence containing dates (YYYY-MM-DD format):\n";
std::getline(std::cin, input);
find_dates(input);
return 0;
}
- Note
-
The code handles leap years correctly, and invalid dates throw an exception.
The following shows a successful parse:
Enter a sentence containing dates (YYYY-MM-DD format):
Today is 2024-02-29, and tomorrow is 2024-03-01.
Valid date found: 2024-Feb-29
Valid date found: 2024-Mar-01
- Note
-
The "Valid date found" output now includes text for the month name.
And the following shows several unsuccessful parses:
Enter a sentence containing dates (YYYY-MM-DD format):
The deadline is 2024-02-30.
Invalid date: 2024-02-30 (Does not exist)
Enter a sentence containing dates (YYYY-MM-DD format):
There are no dates in this sentence.
No valid dates found in the input text.
Culturally Aware Date Formatting
Dates are not represented consistently across the globe. Let’s use Boost.Locale to format dates according to the user’s locale. For example:
-
US: March 15, 2024
-
UK: 15 March, 2024
-
France: 15 mars 2024
-
Germany: 15. März 2024
#include <boost/regex.hpp>
#include <boost/date_time/gregorian/gregorian.hpp>
#include <boost/locale.hpp>
namespace greg = boost::gregorian;
namespace loc = boost::locale;
// Function to check if a date is valid using Boost.Date_Time
bool is_valid_date(int year, int month, int day) {
try {
greg::date test_date(year, month, day);
return true; // If no exception, it's valid
}
catch (const std::exception&) {
return false; // Invalid date
}
}
// Function to format and display dates based on locale
void display_localized_date(const greg::date& date, const std::string& locale_name) {
std::locale locale = loc::generator().generate(locale_name);
std::cout.imbue(locale); // Apply locale to std::cout
std::cout << locale_name << " formatted date: "
<< loc::as::date << date << "\n";
}
// Function to find and validate dates in a text
void find_dates(const std::string& text, const std::string& locale_name) {
boost::regex date_pattern(R"((\d{4})-(\d{2})-(\d{2}))");
boost::smatch match;
std::string::const_iterator start = text.begin();
std::string::const_iterator end = text.end();
bool found = false;
while (boost::regex_search(start, end, match, date_pattern)) {
int year = std::stoi(match[1]);
int month = std::stoi(match[2]);
int day = std::stoi(match[3]);
if (is_valid_date(year, month, day)) {
greg::date valid_date(year, month, day);
std::cout << "Valid date found: " << valid_date << "\n";
display_localized_date(valid_date, locale_name);
}
else {
std::cout << "Invalid date: " << match[0] << " (Does not exist)\n";
}
start = match[0].second; // Move to next match
found = true;
}
if (!found) {
std::cout << "No valid dates found in the input text.\n";
}
}
int main() {
std::locale::global(loc::generator().generate("en_US.UTF-8")); // Default global locale
std::cout.imbue(std::locale()); // Apply to output stream
std::string input;
std::cout << "Enter a sentence containing dates (YYYY-MM-DD format):\n";
std::getline(std::cin, input);
std::string user_locale;
std::cout << "Enter your preferred locale (e.g., en_US.UTF-8, fr_FR.UTF-8, de_DE.UTF-8): ";
std::cin >> user_locale;
find_dates(input, user_locale);
return 0;
}
The following shows successful parses:
Enter a sentence containing dates (YYYY-MM-DD format):
The meeting is on 2024-03-15.
Enter your preferred locale (e.g., en_US.UTF-8, fr_FR.UTF-8, de_DE.UTF-8): en_US.UTF-8
Valid date found: 2024-Mar-15
en_US.UTF-8 formatted date: March 15, 2024
Enter a sentence containing dates (YYYY-MM-DD format):
Rendez-vous le 2024-07-20.
Enter your preferred locale (e.g., en_US.UTF-8, fr_FR.UTF-8, de_DE.UTF-8): fr_FR.UTF-8
Valid date found: 2024-Jul-20
fr_FR.UTF-8 formatted date: 20 juillet 2024
And the following shows an unsuccessful parse:
Enter a sentence containing dates (YYYY-MM-DD format):
The deadline is 2024-02-30.
Enter your preferred locale (e.g., en_US.UTF-8, fr_FR.UTF-8, de_DE.UTF-8): en_US.UTF-8
Invalid date: 2024-02-30 (Does not exist)
Local Time
On a similar global vein, when you install the Boost.DateTime library (or all the Boost libraries), a file containing definitions of time zones across the world is available for your use at: boost_<version>\\libs\\date_time\\data\\date_time_zonespec.csv
.
The following short sample shows how to use the contents of the file. Enter a city and timezone in the IANA format (such as: 'Europe/Berlin' or 'Asia/Tokyo'), and the current date and time will be output.
#include <boost/date_time/local_time/local_time.hpp>
namespace pt = boost::posix_time;
namespace lt = boost::local_time;
int main() {
try {
//---------------------------------------------
// Load the Boost tz_database from CSV
//---------------------------------------------
lt::tz_database tz_db;
tz_db.load_from_file("<YOUR PATH>\\date_time_zonespec.csv"); // Adjust the path to your Boost installation
// Extract all valid timezone names
std::vector<std::string> valid_timezones;
for (const auto& tz_name : tz_db.region_list()) {
valid_timezones.push_back(tz_name);
}
std::string city;
while (true) {
std::cout << "\nEnter 'city/timezone' (or 'exit' to quit, or 'zones' for list of options): ";
std::getline(std::cin, city);
if (city == "exit") break;
if (city == "zones")
{
std::cout << "Available timezones:\n";
for (const auto& tz : valid_timezones) {
std::cout << tz << "\n";
}
}
else
{
// Find the timezone (case-sensitive, must match CSV)
lt::time_zone_ptr tz = tz_db.time_zone_from_region(city);
if (!tz) {
std::cout << "Invalid timezone! Try again.\n";
continue;
}
// Get current UTC time
pt::ptime utc_now = pt::second_clock::universal_time();
// Convert UTC to local time in the chosen timezone
lt::local_date_time local_now(utc_now, tz);
// Get user's local machine time
pt::ptime user_now = pt::second_clock::local_time();
std::cout << "\nYour local system time: " << user_now << "\n";
std::cout << "Current local time in " << city << ": " << local_now << "\n";
}
}
}
catch (const std::exception& e) {
std::cerr << "Fatal error: " << e.what() << "\n";
return 1;
}
return 0;
}
Run the program and test out a few options:
Enter 'city/timezone' (or 'exit' to quit, or 'zones' for list of options): America/New_York
Your local system time: 2025-Sep-03 16:38:02
Current local time in America/New_York: 2025-Sep-03 19:38:02 EDT
Enter 'city/timezone' (or 'exit' to quit, or 'zones' for list of options): Antarctica/South_Pole
Your local system time: 2025-Sep-03 16:38:20
Current local time in Antarctica/South_Pole: 2025-Sep-04 11:38:20 NZST
Enter 'city/timezone' (or 'exit' to quit, or 'zones' for list of options): zones
Available timezones:
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
....
Next Steps
If more complex input is required, consider the Boost.Spirit approach to parsing, refer to Natural Language Processing.